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Abstract 

We introduce the smoothed analysis of algorithms, which continuously in- 
terpolates between the worst-case and average-case analyses of algorithms. 
In smoothed analysis, we measure the maximum over inputs of the expected 
performance of an algorithm under small random perturbations of that in- 
put. We measure this performance in terms of both the input size and the 
magnitude of the perturbations. We show that the simplex algorithm has 
smoothed complexity polynomial in the input size and the standard deviation 
of Gaussian perturbations. 
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1 Introduction 



The Analysis of Algorithms community has been challenged by the existence of remarkable 
algorithms that are known by scientists and engineers to work well in practice, but whose 
theoretical analyses are negative or inconclusive. The root of this problem is that algorithms 
are usually analyzed in one of two ways: by worst-case or average-case analysis. Worst- 
case analysis can improperly suggest that an algorithm will perform poorly by examining its 
performance under the most contrived circumstances. Average-case analysis was introduced 
to provide a less pessimistic measure of the performance of algorithms, and many practical 
algorithms perform well on the random inputs considered in average-case analysis. However, 
average-case analysis may be unconvincing as the inputs encountered in many application 
domains may bear little resemblance to the random inputs that dominate the analysis. 

We propose an analysis that we call smoothed analysis which can help explain the success 
of algorithms that have poor worst-case complexity and whose inputs look sufficiently dif- 
ferent from random that average-case analysis cannot be convincingly applied. In smoothed 
analysis, we measure the performance of an algorithm under slight random perturbations of 
arbitrary inputs. In particular, we consider Gaussian perturbations of inputs to algorithms 
that take real inputs, and we measure the running times of algorithms in terms of their 
input size and the standard deviation of the Gaussian perturbations. 

We show that the simplex method has polynomial smoothed complexity. The simplex 
method is the classic example of an algorithm that is known to perform well in practice but 
which takes exponential time in the worst case [KM72, MurSO, GS79, Gol83, AC78, Jer73, 
AZ99]. In the late 1970's and early 1980's the simplex method was shown to converge in 
expected polynomial time on various distributions of random inputs by researchers including 
Borgwardt, Smale, Haimovich, Adler, Karp, Shamir, Megiddo, and Todd [BorSO, Bor77, 
SmaSS, Hai83, AKS87, AM85, Tod86]. These works introduced novel probabilistic tools 
to the analysis of algorithms, and provided some intuition as to why the simplex method 
runs so quickly. However, these analyses are dominated by "random looking" inputs: even 
if one were to prove very strong bounds on the higher moments of the distributions of 
running times on random inputs, one could not prove that an algorithm performs well in 
any particular small neighborhood of inputs. 

To bound expected running times on small neighborhoods of inputs, we consider linear 
programming problems in the form 

■ • T 

maximize z x 

subject to Ax < y, (1) 

and prove that for every vector z and every matrix A and vector y, the expectation over 
standard deviation a (maxj \\{yi, o,i)\\) Gaussian perturbations A and y of A and y of the 
time taken by a two-phase shadow-vertex simplex method to solve such a linear program is 
polynomial in l/cr and the dimensions of A. 

1.1 Linecir Programming and the Simplex Method 

It is difficult to overstate the importance of linear programming to optimization. Linear 
programming problems arise in innumerable industrial contexts. Moreover, linear program- 
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ming is often used as a fundamental step in other optimization algorithms. In a linear 
programming problem, one is asked to maximize or minimize a linear function over a poly- 
hedral region. 

Perhaps one reason we see so many linear programs is that we can solve them efficiently. 
In 1947, Dantzig [Dan51] introduced the simplex method, which was the first practical ap- 
proach to solving linear programs and which remains widely used today. To state it roiiglily, 
the simplex method proceeds by walking from one vertex to another of the polyhedron de- 
fined by the inequalities in (1). At each step, it walks to a vertex that is better with respect 
to the objective function. The algorithm will either determine that the constraints are 
unsatisfiable, determine that the objective function is unbounded, or reach a vertex from 
which it cannot make progress, which necessarily optimizes the objective function. 

Because of its great importance, other algorithms for linear programming have been 
invented. In 1979, Khachiyan [Kha79] applied the ellipsoid algorithm to linear programming 
and proved that it always converged in time polynomial in d, n, and L — the number of bits 
needed to represent the linear program. However, the ellipsoid algorithm has not been 
competitive with the simplex method in practice. In contrast, the interior-point method 
introduced in 1984 by Karmarkar [Kar84], which also runs in time polynomial in d, n, and 
L, has performed very well: variations of the interior point method are competitive with 
and occasionally superior to the simplex method in practice. 

In spite of half a century of attempts to unseat it, the simplex method remains the most 
popular method for solving linear programs. However, there has been no satisfactory the- 
oretical explanation of its excellent performance. A fascinating approach to understanding 
the performance of the simplex method has been the attempt to prove that there always 
exists a short walk from each vertex to the optimal vertex. The Hirsch conjecture states 
that there should always be a walk of length at most n — d. Significant progress on this 
conjecture was made by Kalai and Kleitman [KK92], who proved that there always exists 
a walk of length at most n'°S2 ^+2 . However, the existence of such a short walk does not 
imply that the simplex method will find it. 

A simplex method is not completely defined until one specifies its pivot rule — the method 
by which it decides which vertex to walk to when it has many to choose from. There 
is no deterministic pivot rule under which the simplex method is known to take a sub- 
exponential number of steps. In fact, for almost every deterministic pivot rule there is a 
family of polytopes on which it is known to take an exponential number of steps [KM72, 
MurSO, GS79, Gol83, AC78, Jer73]. (See [AZ99] for a survey and a unified construction of 
these polytopes). The best present analysis of randomized pivot rules shows that they take 
expected time n*^^^^ [Kal92, MSW96], which is quite far from the polynomial complexity 
observed in practice. This inconsistency between the exponential worst-case behavior of the 
simplex method and its everyday practicality leave us wanting a more reasonable theoretical 
analysis. 

Various average-case analyses of the simplex method have been performed. Most rele- 
vant to this paper is the analysis of Borgwardt [Bor77, BorSO], who proved that the simplex 
method with the shadow vertex pivot rule runs in expected polynomial time for polytopes 
whose constraints are drawn independently from spherically symmetric distributions {e.g. 
Gaussian distributions centered at the origin). Independently, Smale [Sma83, Sma82] proved 
bounds on the expected running time of Lemke's self-dual parametric simplex algorithm on 
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linear programming problems chosen from a spherically-symmetric distribution. Smale's 
analysis was substantially improved by Megiddo [Meg86]. 

While these average-case analyses are significant accomplishments, it is not clear whether 
they actually provide intuition for what happens on typical inputs. Edelman [Ede92] writes 
on this point: 

What is a mistake is to psychologically link a random matrix with the intu- 
itive notion of a "typical" matrix or the vague concept of "any old matrix." 

Another model of random linear programs was studied in a line of research initiated 
independently by Haimovich [Hai83] and Adler [Adl83]. Their works considered the maxi- 
mum over matrices, A, of the expected time taken by parametric simplex methods to solve 
linear programs over these matrices in which the directions of the inequalities are chosen at 
random. As this framework considers the maximum of an average, it may be viewed as a 
precursor to smoothed analysis — the distinction being that the random choice of inequali- 
ties cannot be viewed as a perturbation, as different choices yield radically different linear 
programs. Haimovich and Adler both proved that parametric simplex methods would take 
an expected linear number of steps to go from the vertex minimizing the objective function 
to the vertex maximizing the objective function, even conditioned on the program being 
feasible. While their theorems confirmed the intuitions of many practitioners, they were 
geometric rather than algorithmic^ as it was not clear how an algorithm would locate either 
vertex. Building on these analyses, Todd [Tod86], Adler and Megiddo [AM85], and Adler, 
Karp and Shamir [AKS87] analyzed parametric algorithms for linear programming under 
this model and proved quadratic bounds on their expected running time. While the random 
inputs considered in these analyses are not as special as the random inputs obtained from 
spherically symmetric distributions, the model of randomly flipped inequalities provokes 
some similar objections. 

1.2 Smoothed Analysis of Algorithms and Related Work 

We introduce the smoothed analysis of algorithms in the hope that it will help explain the 
good practical performance of many algorithms that worst-case does not and for which 
average-case analysis is unconvincing. Our first application of the smoothed analysis of 
algorithms will be to the simplex method. We will consider the maximum over A and y of 
the expected running time of the simplex method on inputs of the form 

• • T 

maximize z x 

subject to {A + G)x <{y + h), (2) 

where we let A and y be arbitrary and G and hhe a matrix and a vector of independently 
chosen Gaussian random variables of mean and standard deviation a (maxj \\{yi, If 
we let fj go to 0, then we obtain the worst-case complexity of the simplex method; whereas, 
if we let a be so large that G swamps out A, we obtain the average-case analyzed by 
Borgwardt. By choosing polynomially small a, this analysis combines advantages of worst- 
case and average-case analysis, and roughly corresponds to the notion of imprecision in 
low-order digits. 

^Our results in Section 4 are analogous to these results. 
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In a smoothed analysis of an algorithm, we assume that the inputs to the algorithm are 
subject to slight random perturbations, and we measure the complexity of the algorithm in 
terms of the input size and the standard deviation of the perturbations. If an algorithm has 
low smoothed complexity, then one should expect it to work well in practice since most real- 
world problems are generated from data that is inherently noisy. Another way of thinking 
about smoothed complexity is to observe that if an algorithm has low smoothed complexity, 
then one must be unlucky to choose an input instance on which it performs poorly. 

We now provide some definitions for the smoothed analysis of algorithms that take real 
or complex inputs. For an algorithm A and input x, let 

Ca{x) 

be a complexity measure of A on input x. Let X be the domain of inputs to A, and let 
Xn be the set of inputs of size n. The size of an input can be measured in various ways. 
Standard measures are the number of real variables contained in the input and the sums 
of the bit-lengths of the variables. Using this notation, one can say that A has worst-case 
C-complexity /(n) if 

max(CA(a;)) = f{n). 

Given a family of distributions /x„ on X„, we say that A has average-case C-complexity f{n) 
under n if 

E [CA{x)] = f{n). 

X^Xn 

Similarly, we say that A has smoothed C-complexity f{n,a) if 

maKB[CA{x + {a \\x\\^) g)] = f{n,a), (3) 

xeXn g 

where (<j ||a;||7) p is a vector of Gaussian random variables of mean and standard deviation 
(7 II a; II 7 and ||a;||? is a measure of the magnitude of a;, such as the largest element or the norm. 
We say that an algorithm has polynomial smoothed complexity if its smoothed complexity is 
polynomial in n and l/cr. In Section 6, we present some generalizations of the definition of 
smoothed complexity that might prove useful. To further contrast smoothed analysis with 
average-case analysis, we note that the probability mass in (3) is concentrated in a region of 
radius 0{ay/n) and volume at most 0{a^/n)'^, and so, when a is small, this region contains 
an exponentially small fraction of the probability mass in an average-case analysis. Thus, 
even an extension of average-case analysis to higher moments will not imply meaningful 
bounds on smoothed complexity. 

A discrete analog of smoothed analysis has been studied in a collection of works inspired 
by Santha and Vazirani's semi-random source model [SV86]. In this model, an adversary 
generates an input, and each bit of this input has some probability of being flipped. Blum 
and Spencer [BS95] design a polynomial-time algorithm that fc-colors A;-colorable graphs 
generated by this model. Feige and Krauthgamer [FK] analyze a model in which the adver- 
sary is more powerful, and use it to show that Turner's algorithm [Tur86] for approximating 
the bandwidth performs well on semi-random inputs. They also improve Turner's analysis. 
Feige and Kilian [FK98] present polynomial-time algorithms that recover large independent 
sets, /c-colorings, and optimal bisections in semi-random graphs. They also demonstrate 
that significantly better results would lead to surprising collapses of complexity classes. 
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1.3 Our Results 



We consider the maximum over z, y, and ai, . . . , a„ of the expected time taken by a two- 
phase shadow vertex simplex method to solve linear programming problems of the form 

• • T 

maximize z x 

subject to {ai\x) < Ui, iov 1 < i < n, (4) 

where each Cj is a Gaussian random vector of standard deviation crmaxj ||(yj, ai)|| centered 
at Oj, and each yj is a Gaussian random variable of standard deviation crmaxj ||(yj,ai)|| 
centered at yi. 

We begin by considering the case in which y = 1, \\ai\\ < 1, and a < l/3\/dlnn. In 
this case, our first result, Theorem 4.0.1, says that for every vector t the expected size of 
the shadow of the polytope — the projection of the polytope defined by the equations (4) 
onto the plane spanned by t and z — is polynomial in n, the dimension, and l/a. This 
result is the geometric foundation of our work, but it does not directly bound the running 
time of an algorithm, as the shadow relevant to the analysis of an algorithm depends on 
the perturbed program and cannot be specified beforehand as the vector t must be. In 
Section 3.3, we describe a two-phase shadow-vertex simplex algorithm, and in Section 5 we 
use Theorem 4.0.1 as a black box to show that it takes expected time polynomial in n, d, 
and l/cr in the case described above. 

Efforts have been made to analyze how much the solution of a linear program can 
change as its data is perturbed. For an introduction to such analyses, and an analysis of 
the complexity of interior point methods in terms of the resulting condition number, we 
refer the reader to the work of Renegar [Ren95b, Ren95a, Ren94]. 

1.4 Intuition Through Condition Numbers 

For those already familiar with the simplex method and condition numbers, we include this 
section to provide some intuition for why our results should be true. 

Our analysis will exploit geometric properties of the condition number of a matrix, rather 
than of a linear program. We start with the observation that if a corner of a polytope is 
specified by the equation Ajx = yj, where J is a d-set, then the condition number of the 
matrix Aj provides a good measure of how far the corner is from being flat. Moreover, it is 
relatively easy to show that if A is subject to perturbation, then it is unlikely that Aj has 
poor condition number. So, it seems intuitive that if A is perturbed, then most corners of 
the polytope should have angles bounded away from being fiat. This already provides some 
intuition as to why the simplex method should run quickly: one should make reasonable 
progress as one rounds a corner if it is not too flat. 

There are two difficulties in making the above intuition rigorous: the first is that even 
if Aj is well-conditioned for most sets /, it is not clear that Aj will be well-conditioned for 
most sets / that arc bases of corners of the polytope. The second difficulty is that even 
if most corners of the polytope have reasonable condition number, it is not clear that a 
simplex method will actually encounter many of these corners. By analyzing the shadow 
vertex pivot rule, it is possible to resolve both of these difficulties. 

The first advantage of studying the shadow vertex pivot rule is that its analysis comes 
down to studying the expected sizes of shadows of the polytope. From the specification of 
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the plane onto which the polytope will be projected, one obtains a characterization of all 
the corners that will be in the shadow, thereby avoiding the complication of an iterative 
characterization. The second advantage is that these corners are specified by the property 
that they optimize a particular objective function, and using this property one can actually 
bound the probability that they are ill-conditioned. While the results of Section 4 are not 
stated in these terms, this is the intuition behind them. 

Condition numbers also play a fundamental role in our analysis of the shadow-vertex 
algorithm. The analysis of the algorithm differs from the mere analysis of the sizes of 
shadows in that, in the study of an algorithm, the plane onto which the polytope is pro- 
jected depends upon the polytope itself. This correlation of the plane with the polytope 
complicates the analysis, but is also resolved through the help of condition numbers. In 
our analysis, we view the perturbation as the composition of two perturbations, where the 
second is small relative to the first. We show that our choice of the plane onto which we 
project the shadow is well-conditioned with high probability after the first perturbation. 
That is, we show that the second perturbation is unlikely to substantially change the plane 
onto which we project, and therefore unlikely to substantially change the shadow. Thus, it 
suffices to measure the expected size of the shadow obtained after the second perturbation 
onto the plane that would have been chosen after just the first perturbation. 

The technical lemma that enables this analysis. Lemma 5.1.1, is a concentration result 
that proves that it is highly unlikely that almost all of the minors of a random matrix have 
poor condition number. This analysis also enables us to show that it is highly unlikely that 
we will need a large "big-M" in phase I of our algorithm. 

We note that the condition numbers of the Ajs have been studied before in the complex- 
ity of linear programming algorithms. The condition number XA of Vavasis and Ye [VY96] 
measures the condition number of the worst sub-matrix Aj, and their algorithm runs in 
time proportional to \n{xA)- Todd, Tungel, and Ye [TTYOl] have shown that for a Gaus- 
sian random matrix the expectation of In(x^) is 0(min((ilnn, n)). That is, they show that 
it is unlikely that any Aj is exponentially ill-conditioned. It is relatively simple to apply 
the techniques of Section 5.1 to obtain a similar result in the smoothed case. We won- 
der whether our concentration result that it is exponentially unlikely that many Aj are 
even polynomially ill-conditioned could be used to obtain a better smoothed analysis of the 
Vavasis- Ye algorithm. 

1.5 Discussion 

One can debate whether the definition of polynomial smoothed complexity should be that 
an algorithm have complexity polynomial in 1/a or \og{l/a). We believe that the choice 
of being polynomial in 1 /a will prove more useful as the other definition is too strong and 
quite similar to the notion of being polynomial in the worst case. In particular, one can 
convert any algorithm for linear programming whose smoothed complexity is polynomial in 
d, n and log(l/cr) into an algorithm whose worst-case complexity is polynomial in d, n, and 
L. That said, one should certainly prefer complexity bounds that are lower as a function of 
1/a, d and n. 

We also remark that a simple examination of the constructions that provide exponential 
lower bounds for various pivot rules [KM72, MurSO, GS79, Gol83, AC78, Jer73] reveals that 
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of these pivot rules have smoothed complexity polynomial in n and sub-polynomial 
That is, these constructions are unaffected by exponentially small perturbations. 



12 



2 Notation and Mathematical Preliminaries 



In this section, we define the notation that wih be used in the paper. We will also review 
some background from mathematics and derive a few simple statements that we will need. 
The reader should probably skim this section now, and save a more detailed examination 
for when the relevant material is referenced. 

• [n] denotes the set of integers between 1 and n, and ('^') denotes the subsets of [n] of 
size k. 

• Subsets of [n] are denoted by the capital Roman letters /, J, L, K. M. will denote a 
subset of integers, and /C will denote a set of subsets of [n\. 

• Subsets of are denoted by the capital Roman letters A, B, P,Q,R, S,T,U,V . 

• Vectors in IR^ are denoted by bold lower-case Roman letters, such as Oj, di, di, bi, Ci, 
di, h, t, q, z, y. 

• Whenever a vector, say a G TR!^ is present, its components will be denoted by lower- 
case Roman letters with subscripts, such as oi, . . . , a^. 

• Whenever a collection of vectors, such as Oi,...,a„, are present, the similar bold 
upper-case letter, such as A, will denote the matrix of these vectors. For I G (t^'), 
Aj will denote the matrix of those for which i e I. 

• Matrices are denoted by bold upper-case Roman letters, such as A, A, A, B, M and 

• S'^~^ denotes the unit sphere in IR''. 

• Vectors in will be denoted by bold Greek letters, such as uj,iI^,t. 

• Generally speaking, univariate quantities with scale, such as lengths or heights, will 
be represented by lower case Roman letters such as c, h, I, r, s, and t. The principal 
exceptions are that k and M will also denote such quantities. 

• Quantities without scale, such as the ratios of quantities with scale or affine coor- 
dinates, will be represented by lower case Greek letters such as a, (3, A, ^, C,. a will 
denote a vector of such quantities such as {ai , . . . , aa) ■ 

• Density functions are denoted by lower case Greek letters such as /x and u. 

• The standard deviations of Gaussian random variables are denoted by lower-case 
Greek letters such as a, r and p. 

• Indicator random variables are denoted by upper case Roman letters, such as A, B, 
E, F, V, W, X, Y, and Z 

• Functions into the reals or integers will be denoted by calligraphic upper-case letters, 
such as J^,Q,S~^ ,S' ,T. 
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• Functions into are denoted by upper-case Greek letters, such as T, ^. 

• {x\y) denotes the inner product of vectors x and y. 

• For vectors oj and z, we let angle (oj, 2;) denote the angle between these vectors at 
the origin. 

• The logarithm base 2 is written Ig and the natural logarithm is written In. 

• The probability of an event A is written Pr [A] , and the expectation of a variable X 
is written E [X] . 

• The indicator random variable for an event A is written [A] . 
2.1 Geometric Definitions 

For the following definitions, we let Oi, . . . , Ofe denote a set of vectors in IR'^. 

• Span (ci, . . . , Cfe) denotes the subspace spanned by ai, . . . , a^. 

• Aff (fli, . . . , flfc) denotes the hyperplane that is the affine span of Ci, . . . , a^: the set 
of points Yli Oiitti, where = 1, for all i. 

• ConvHull (oi, . . . , afe) denotes the convex hull of Oi, . . . , o^. 

• Cone (oi, . . . , fflfc) denotes the positive cone through ai,...,afc: the set of points 
Yj^aiai, for ai > 0. 

• A (ci, . . . , Ud) denotes the simplex ConvHull (oi, . . . , a^). 

For a linear program specified by Oi, . . . , a„, y and 2;, we will say that the linear program 
is in general position if 

• The points ai, . . . , a„ are in general position with respect to y, which means that for 
all / C ('^^) and x = Aj^yj, and ah j I, {aj\x) ^ yj. 

• For ah I C (J"\), z ^ Cone (A/). 

Furthermore, we will say that the linear program is in general position with respect to a 
vector t if the set of A for which there exists an / G (J"\) such that 

{l-X)t + \ze Cone {Ai) 

is finite and does not contain 0. 
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2.2 Vector and Matrix Norms 

The material of this section is principally used in Sections 3.3 and 5.1. The following 
definitions and propositions are standard, and may be found in standard texts on Numerical 
Linear Algebra. 

Definition 2.2.1 (Vector Norms) For a vector x, we define 

• ll^lll = Yli 

• \\x\\^ = maxj \xi\. 

Proposition 2.2.2 (Vectors norms) For a vector x € TR'^, 

\\x\\ < ||a;||^ < Vd\\ x\\ . 
Definition 2.2.3 (Matrix norm) For a matrix A, we define 

\\A\\ max / ||a;|| . 

X 

Proposition 2.2.4 (Properties of matrix norm) For d-by-d matrices A and B, and a 
d-vector x, 

(a) \\Ax\\ < \\A\\ \\x\\. 

(b) \\AB\\ < \\A\\ \\B\\. 

(c) 11^11 = ||^^||. 

(d) \\A\\ < \/dmaxi ||ai||, where A = (ai, . . . , a^). 

(e) det(A) < \\Af. 

Definition 2.2.5 (Smin ()) For a matrix A, we define 

s ■ (A) \\A-^\r^ 

We recall that Smin (^) is the smallest singular value of the matrix A, and that it is not a 
norm. 

Proposition 2.2.6 (Properties of Smin ()) For d-by-d matrices A and B, 

(a) Smm(^) = min^, ||^a;|| / 

(b) Smin{B) > Smin{A) - \\A - B\\ . 
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2.3 Probability 

For an event, A, we let [A] denote the indicator random variable for the event. We generally 
describe random variables by their density functions. If x has density fi, then 



Pr [A{x)] = J [A{x)] ii{x) dx . 



If B is another event, then 



In a context where multiple densities are present, we will use use the notation Pr^ [^(a?)] 
to indicate the probability of A when x is distributed according to /U. 

In many situations, we will not know the density /x of a random variable x, but rather 
a function v such that i^{x) = c^{x) for some constant c. In this case, we will say that x 
has density proportional to v. 

The following Propositions and Lemmas will play a prominent role in the proofs in this 
paper. The only one of these which might not be intuitively obvious is Lemma 2.3.5. 

Proposition 2.3.1 (Average < maximum) Led fi{x,y) be a density function, and let 
X and y be distributed according to ii{x,y). If A{x,y) is an event and X(x,y) is random 
variable, then 

Pr[A{x,y)] < ma-K Pr[A{x,y)] , and 

x,y X y 

E[X{x,y)]<Tii&^E[X{x,y)], 

x,y X y 

where in the right-hand terms, y is distributed according to the induced distribution ii{x,y). 

Proposition 2.3.2 (Expectation on sub-domain) Letx be a random variable and A{x) 
an event. Let P be a measurable subset of the domain of x . Then, 

Pr [A{x)] < Pr[A{x)] / Pr[x G P\ . 

Proof By the definition of conditional probability, 
Pr [A{x)\ = Vv[A{x)\x eP] 

X&P 

= Pr [A{x) and x e P] / Pr [x e P] , by Bayes' rule, 

< Pr [^(a;)]/Pr [x e P] . 



Lemma 2.3.3 (Comparing expectations) Let X and Y be non-negative random vari- 
ables and A an event satisfying (1) X < k, (2) Pr[A\ > 1 — e, and (3) there exists a 
constant c such that E[X\A] <cE[Y\A]. Then, 

E[X] < cE[Y]-\-ek. 
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Proof 

E [X] = E [X\A] Pr [A] + E [X|not(yl)] Pr [not (A)] 

< cE[y|A]Pr [A]+ek 

< cE,[Y] + ek, 

by Proposition 2.3.2. ■ 

Lemma 2.3.4 (Similar distributions) Let X be a non-negative random variable such 

that X < k. Let v and ^ be density functions for which there exists a set S such that (1) 
Pvi, [S]> 1 — e and (2) there exists a constant c > 1 such that for all a & S, z/(o) < cii{a). 
Then, 

E[X{a)\ < cE[X{a)]+ke. 



Proof We write 



E[X]= [ X{a)i^{a)da+ [ X{a)iy{a)da 

Ja&S Ja^S 

< c / X{a)iJ,{a) da + ke 

JaeS 

<c / X{a)fi{a)da + ke 

J a 



= c'E[X]+ke. 



Lemma 2.3.5 (Combination lemma) Let x and y be random variables distributed ac- 
cording to fj,{x, y). Let J^{x) and G{x, y) be non-negative functions and a and jS be constants 
such that 

• Ve > 0, Pvx^y [^{x) < e] < ae, and 

• Ve > 0, max^ Pvy [g{x,y) < e] < {(3ef , 

where in the second line y is distributed according to the induced density fx{x,y). Then 

Pr[J='{x)g{x,y) <e]< 4al3e. 
x,y 

Proof Consider any x and y for which !F{x)Q{x,y) < e. If i is the integer for which 

2'/3e < J^{x) < 2^+Ve, 

then Q{x,y) < 2"*//?. Thus, J^{x)Q{x,y) < e, imphes that either J^{x) < 2/3e, or there 
exists an integer i > 1 for which 

J^{x)<2'+^Pe and g{x,y)<2-yp. 
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So, we obtain the bound 

Pr [J^{x)g{x, y) <e] < Pr [J^{x) < 2/3e] + V Pr \j^{x) < 2'+^(3e and g{x, y) < / f3\ 

x,y x,y ^ — ' x,y 

< 2a(5e + V Pr [^(x) < 2'+Vel Pr \Q{x, y) < 2-yp\j^(x) < 

— ' x,y '- x,y '- ' 

i>l 

< 2a/3e + ^Pr [j^{x) < 2^+^/3e] max Pr [g{x, y) < 2'^ 0\ 

i>i 

< 2a/?e + J2 (2'+^a/3e) (2"^)^ , by Proposition 2.3.1, 

i>l 

= 2aPe + aPe^2^-' 
i>i 

= Aa(3e. 



As we have found this lemma very useful in our work, and we suspect others may as 
well, we state a more broadly applicable generalization. It's proof is similar. 

Lemma 2.3.6 (Generalized combination lemma) Let x and y he random variables 
distributed according to ii{x,y). There exists a function c{a,b) such that if J^{x) and Q{x,y) 
are non-negative functions and a, (5, a and b are constants such that 

• Prx,y [^{x) < e] < (ae)", and 

• max^ Pry [g{x,y) < e] < (/3e)^ 

where in the second line y is distributed according to the induced density fi{x,y), then 
Pr[J^{x)g{x,y) <e]< c(a, 5)a/3e°""('^''') lg(l/e)[«=^l, 

where [a = b] is 1 if a = b and otherwise. 

Lemma 2.3.7 (Almost polynomial densities) Let k > and let t be a non-negative 
random variable with density proportional to fi{t)t^ such that, for some to > 0, 

maxo<f<fo nit) ^ ^ 
mmo<t<to l^{t) ~ 

Then, 



Pr[t <e]< c(e/to)^+^ 
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Proof For e>to, the lemma is vacuously true. Assuming e <to, 

Pr [t < e] 



Pr [t<€] < 



< 



< 



Pr [t < to] 

maxo<t<to fi{t) /j^^Q t'^ dt 
mino<t<to ii{t) //^o dt 
efe+V(fc + l) 

c(eAo)'=+^ 



2.4 Gaussian Random Vectors 

For the convenience of the reader, we recall some standard facts about Gaussian random 
variables and vectors. These may be found in [Fel68, VII. 1] and [Fel71, III. 6]. We then 
draw some corollaries of these facts and derive some lemmas that we will need later in the 
paper. 

We first recall that a univariate Gaussian distribution with mean and standard devi- 
ation a has density 

and that a Gaussian random vector in M*^ centered at a point a with covariance matrix M 
has density 

\ g-(o-a)^M-i(o-a)/2 

For positive-definite M, there exists a basis in which the density can be written 

where erf < • • • < are the eigenvalues of M. When all the eigenvalues of M are the 
same and equal to a, then we will refer to the density as a Gaussian distribution of standard 
deviation u. 

Proposition 2.4.1 (Additivity of Gaussians) // a\ is a Gaussian random vector with 
covariance matrix Mi centered at a point ai and 02 is a Gaussian random vector with 
covariance matrix M2 centered at a point 0,2, then Oi + 02 is the Gaussian random vector 
with covariance matrix Mi + M2 centered at 0,1 + 0,2- 
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Lemma 2.4.2 (Smoothness of Gaussians) Let ii{x) he a Gaussian distribution of stan- 
dard deviation a centered at a point a. Let k > 1, let dist (a;, a) < k and let dist {x, y) < 
e < k. Then, 

fj,{x) - 

Proof By translating a, x and y, we may assume a = and ||a;|| < k. We then have 

fi{x) 

> e-(2^ll^ll+^')/2'^', as ||y|| < +e 
>g-{2.fc+eW, e^\\x\\<k 

> e-^^^l'"^" as e < it. 



Proposition 2.4.3 (Restrictions of Gaussians) Letn he a Gaussian distribution of stan- 
dard deviation a centered at a point a. Let v he any vector and r any real. Then, the induced 
distribution 

fi{x\v'^x = r) 

is a Gaussian distribution of standard deviation a centered at the projection of a onto the 
plane [x : v^x = r}. 

Proposition 2.4.4 (Gaussian measure of halfspaces) Let uj be any unit vector in M*^ 
and r any real. Then, 

Proof Immediate if one expresses the Gaussian density in a basis containing uj. ■ 

The distribution of the square of the norm of a Gaussian random vector is the Chi- 
Square distribution. We use the following weak bound on the Chi-Square distribution, 
which follows from Equality (26.4.8) of [AS70]. 

Proposition 2.4.5 (Chi-Square bound) Let x he a Gaussian random vector in M*^ of 
standard deviation u centered at the origin. Then, 



2d/2-ir(d) 



From this, we derive 
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Corollary 2.4.6 (A chi-square bound) Let x be a Gaussian random vector in M*^ of 
standard deviation a centered at the origin. Then, for n> 3 



Pr 



1x11 > SV d In na 



< n 



-2.9d 



Moreover, if n > d > 3, and such vectors, then 



Pr 



max II a;,- II > sVdln na 



Proof For a = 3\/ln na we can apply Stirling's formula [AS70] to (5) to find 



Pr 



\x\ 



> aVd 



< 



(a2d)'^/2-ie-°''^/2e'^/2yd72 
2^^/2-1 (d/2)''/2v^ 



2d/2-i(^/2)<^/220F 



dir 



2\d/2 ^_(^a^_i)d/2 



^ g-(a2-ln(a2)-l)d/2 



< e 



-2. Minn 



= n 



-2.9d 



as 



(a^ - ln(a2) - 1) = 91n(n) - ln(91nn) - 1 > ln(n)(9 - In 9 - 1) > 5.81n(n). 



We also prove it is unlikely that a Gaussian random variable has small norm. 

Proposition 2.4.7 (Gaussian near point or plane) Let x be a d-dimensional Gaus- 
sian random vector of standard deviation a centered anywhere. Then, 

(a) For any point p, Pr [dist {x,p) < e] < ^min ^1, ^/e/d^ j o,i^d 

(b) For a plane H of dimension h, Pr [dist {x,H) < e] < {e/aY"^. 

Proof Let x be the center of the Gaussian distribution, and let B^{p) denote the ball of 
radius e around p. Recall that the volume of B^{p) is 

2^d/2^d 



dV{d/2) 

To prove part (a), we bound the probability that dist {x,p) < e by 



1 



'2naJ JxeBeip) 



rlK--)f/2-^da; < 



1 Y ( 27:'^/h'^\ /e 



2^a) ydr{d/2)j \a) d2d/2r{d/2)' 
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By Proposition 2.4.8, we have for d > 3 

2 



c/2'«/2r(d/2) 



Combining with the fact that 2/(d2'^/2r((i/2)) < 1 for all > 1, wc establish (a). 

To prove part (5), we consider a basis in which d — h vectors are perpendicular to H, 
and apply part (a) to the components of x in the span of those basis vectors. ■ 



Proposition 2.4.8 (Gamma Inequality) For d>3 

^ < {e/df/^ 



d2d/^r{d/2) 

Proof For d > 3, we apply the inequality r(x + 1) > \p2M\fx{xleY to show 
2 2 / 2e 



< 



d2'i/^r{d/2) - d2^/2v^^(d-2)/2 \d-2, 
/ e(<i-2)/2 \ / d 



\dd/'^V2^^{d-2)/2 J \d-2 
< {eldfl\ 



where the last inequality used the inequalities 1 + 2/ {d—2) < e^l'^'^ ^) and \/27r-\/ [d — l)/2 > 
1 when d > 3. ■ 



Proposition 2.4.9 (Non-central Gaussian near the origin) For any integer d > 3, 
let X be a d- dimensional Gaussian random vector of standard deviation a centered at x. 
Then, for e < l/{y/2e) 



Pr 



\x\ 



xf + du'^ e 



< (\/2^e)' 



Proof Let A = ||a;||. We divide the analysis into two cases: (1) A < \fdu, and (2) 
A > y/d(T. 

For A < ^/da, 



Pr 



|a;|| < (VA2 + d(72)e 



< Pr 



|x|| < {■\/2da)e\ < (V^e)"^ 



by Part (a) of Lemma 2.4.7. 
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For A > \/d(7, let Br be the ball of radius r around the origin. Applying the assumption 
e < l/(v^^) letting A = cy/da for c > 1, we have 



Pr 



|a;|| < (VA2 + da2)e 



< Pr 



< (\/2A)e 



< 



2^ay \^dr(d/2)^ 



(V2eA)'^e-(i-Ve)^AV2a2 



< (V2ee) 



A'^ 



-(1-1/6)2^2/2^2 



= (V2ie)'^e'^(^'''^-'^'(^-^/^)'/2) 
< (V2^e)^ 

where the second inequality holds because e < l/(v^e) and for any point x G B^^^, 

g-||(^-S)||2/2a2 < g-(l-V26)2A2/2a2 < g-(l-l/e)2A2/2a2^ 

the third inequality follows from Proposition 2.4.8, and the last inequality holds because 
one can prove for any c > 1, Inc — c^(l — l/e)^/2 < 0. ■ 

Bounds such as the following on the tails of Gaussian distributions are standard (see, 
for example [Fel68, Section VII. 1]) 



Proposition 2.4.10 (Gaussian tail bound) 

-x2/2o-2 



2-iT v27r(7 Jt=x 



a a" 



-x2/2o-2 



27r 



Using this, we prove: 
Lemma 2.4.11 (Comparing Gaussian tails) Let a <1 and let 



1 



-t2/2o-2 



27r(7 



Then, for x <2 and |a; — y| < e, 



Jt^yfi{t)dt 



(6) 



Proof If y < X, the ratio is greater than 1 and the lemma is trivially true. Assuming 
y > X, the ratio is minimized when y = x + e. In this case, the lemma will follow from 



(7) 
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It follows from part (6) of Proposition 2.4.12 that the left-hand ratio in (7) is monotonically 
increasing in x, and therefore is maximized when x is maximized at 2. For x = 2, we apply 
Proposition 2.4.10 to show 

H{t) dt >[-- — \ . > 



'l-KG Jt=x ~ V 2 8 y v'27r ~ 8V27r 

We then combine this bound with 



2 rx+e 
V^TTCT Jt=x 



'l-KO Jt=x •\/27rcr 
to obtain 

It=xt^ii)dt - [ y^a i I 3(76-2/-^ I 3(72 



Proposition 2.4.12 (Monotonicity of Gaussian density) Xei 



/27r(7 

f'aj For all a > 0, ii{x)/ii{x + a) is monotonically increasing in x; 
(b) The following ratio is monotonically increasing in x 

!^^n{t)dt 

Proof Part (a) follows from 



H{x) 



li{x + a) 



{2ax+o?)/2a'^ 



and that e^*^^ is monotonically increasing in x. 
To prove part (6) note that for all a > 

J^^ t^it) dt _ + t)dt ^ J^Z /^(^ + « + t) dt _ JZ+^ Kt) dt 



fx{x) fj,{x) fi{x + a) ii{x + a) ' 

where the inequality follows from part (a). ■ 
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2.5 Changes of Variables 

The main proof technique used in Section 4 is change of variables. For the reader's conve- 
nience, we recall how a change of variables affects probability distributions. 

Proposition 2.5.1 (Change of variables) Let y be a random variable distributed ac- 
cording to density fi. If y = ^{x), then x has density 



i^mx)) 



X, 



Recall that 



det 



dx 



is the Jacobian of the change of variables. 

We now introduce the fundamental change of variables used in this paper. Let Ci, . . . , 
be linearly independent points in M'^. We will represent these points by specifying the plane 
passing through them and their positions on that plane. Many studies of the convex hulls of 
random point sets have used this change of variables (for example, see [RS63, RS64, Efr65, 
Mil71]). We specify the plane containing Ci, . . . ,0^ by a; and r, where = 1, r > 
and {uj\ai) = r for all i. Wc will not concern ourselves with the issue that u is ill-dcfincd 
if the Ci, . . . ,0(1 are affinely dependent, as this is an event of probability zero. To specify 
the positions of ai, . . . , on the plane specified by (u;,r), we must choose a coordinate 
system for that plane. To choose a canonical set of coordinates for each {d— 1) -dimensional 
hyperplane specified by {uj,r), we first fix a reference unit vector in IR*^, say q, and an 
arbitrary coordinatization of the subspace orthogonal to q. For any uj ^ —q, we let 

denote the linear transformation that rotates g to w in the two-dimensional subspace 
through q and u: and that is the identity in the orthogonal subspace. Using R^^, we 
can map points specified in the d — 1 dimensional hyperplane specified by r and uj to IR*^ 

by 

flj = Rubi + rio, 

where bi is viewed both as a vector in IR''"^ and as an element of the subspace orthogonal to 
q. We will not concern ourselves with the fact that this map is not well defined if g = —a;, 
as the set of ai, . . . , that result in this coincidence has measure zero. 

The Jacobian of this change of variables is computed by a famous theorem of integral 
geometry due to Blaschke [Bla35] (for more modern treatments, sec [Mil71] or [San76, 
12.24]), and actually depends only marginally on the coordinatizations of the hyperplanes. 

Theorem 2.5.2 (Blaschke) For variables bi,...,bci taking values in IR''"^, a; G S'^~^ 
and r G JR, let 

(fli, ...,ad) = (-Rw&i +ru},..., Ru,bd + ruj) 
The Jacobian of this map is 

d{ai, . . .,a,i) 



det 



That is, 



dai 



d{u},r,bi,...,bd 
dad = (ci- l)!Vol(A(6i, 



(d-l)!Vol(A(6i,...,6rf)). 



, bd)) doj dr dbi . . . dbd 
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We will also find it useful to specify the plane by u and s, where {sq\to) = r, so that 
sq lies on the plane specified by u and r. We will also arrange our coordinate system so 
that the origin on this plane lies at sq. 

Corollary 2.5.3 (Blaschke with s) For variables 61, . . . , 6^ taking values in TR'^~^ , a; G 
S"^-^ and seJR, let 

(ai, . . . , Od) = {Rujbi + sq,..., R^ba + sq) 

The Jacobian of this map is 

d{ai, ...,ad) 



det 



d{u),s,bi,. . . ,bd) 



(d-l)!(a;|g)Vol(A(6i,...,M)- 



Proof So that we can apply Theorem 2.5.2, we will decompose the map into three simpler 
maps: 

(61, ...,bd,s,u})>-^ (61 + RZ^{sq - roj), . . . , 6d + RZ^i^Q - ru)), s,u}) 
HH- (61 + RZ^{sq - ru>), ...,bd + RZ^{sq - ru}),r,u)) 
^ (i?u; (&1 + RZHsq - rw)) +rio,...,R^ {bd + RZH^Q - rto)) + rw) 
= {R^bi + sq,..., R^bd + sq) 

As sq — ru} is orthogonal to uj, R~^{sq — ruj) can be interpreted as a vector in the d — 1 
dimensional space in which bi, . . . ,bd lie. So, the first map is just a translation, and its 
Jacobian is 1. The Jacobian of the second map is 



dr 



ds 



Finally, we note 

Vol (61 + Rz}{sq - ru), . . . , 6^ + RZ,\sq - ru)) = Vol (61, . . . , 6^) , 

and that the third map is one described in Theorem 2.5.2. ■ 

In Section 4.2, we will need to represent a; by c = (a;|qf) and ij) G S"'"^, where V gives 
the location of u in the cross-section of S'^^^ for which {u\q) = c. Formally, the map can 
be defined in a coordinate system with first coordinate q by 

u = (c, i/jv^l — c^). 

For this change of variables, we have: 

Proposition 2.5.4 (Latitude and longitude) The Jacobian of the change of variables 
from u to (c, tp) is 

d{u) 



det 



d{c,tP) 



(1 _ c2)('^-3)/2. 
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Proof We begin by changing a; to {d,ip), where 9 is the angle between uj and q, and 
represents the position of uj in the d — 2 dimensional sphere of radius sin(0) of points 
at angle 6 to q. To compute the Jacobian of this change of variables, we choose a local 
coordinate system on S'^~^ at w by taking the great circle through uj and q, and then an 
arbitrary coordinatization of the great d — 2 dimensional sphere through u orthogonal to 
the great circle. In this coordinate system, 9 is the position of lo along the first great circle. 
As the d — 2 dimensional sphere of points at angle ^ to is orthogonal to the great circle 
at LO, the coordinates in tp can be mapped orthogonally into the coordinates of the great 
d — 2 dimensional sphere-the only difference being the radii of the sub-spheres. Thus, 



det 



8(9,11^) 



sm{9) 



d-2 



If we now let c = cos(^), then we find 



det 



g(^) 
d{c,iP) 



det 



g(^) 

d{9,ip) 



\d{c)J 



d-2 



d-3 
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3 The Shadow Vertex Method 



In this section, we will review the shadow vertex method and formally state the two-phase 
method analyzed in this paper. We will begin by motivating the method. In Section 3.1, 
we will explain how the method works assuming a feasible vertex is known. In Section 3.2, 
we present a polar perspective on the method, from which our analysis is most natural. We 
then present a complete two-phase method in Section 3.3. For a more complete exposition 
of the Shadow Vertex Method, we refer the reader to [BorSO, Chapter 1]. 

The shadow-vertex simplex method is motivated by the observation that the simplex 
method is very simple in two-dimensions: the set of feasible points form a (possibly open) 
polygon, and the simplex method merely walks along the exterior of the polygon. The 
shadow-vertex method lifts the simplicity of the simplex method in two dimensions to 
higher dimensions. Let z be the objective function of a linear program and let t be an 
objective function optimized by a;, a vertex of the polytopc of feasible points for the linear 
program. The shadow-vertex method considers the shadow of the polytope — the projection 
of the polytope onto the plane spanned by z and t. One can verify that 

(1) this shadow is a (possibly open) polygon, 

(2) each vertex of the polygon is the image of a vertex of the polytope, 

(3) each edge of the polygon is the image of an edge between two adjacent vertices of the 
polytope, 

(4) the projection of x onto the plane is a vertex of the polygon, and 

(5) the projection of the vertex optimizing z onto the plane is a vertex of the polygon. 

Thus, if one walks along the vertices of the polygon starting from the image of x, and keeps 
track of the vertices' pre-images on the polytope, then one will eventually encounter the 
vertex of the polytope optimizing z. Given one vertex of the polytope that maps to a vertex 
of the polygon, it is easy to find the vertex of the polytopc that maps to the next vertex 
of the polygon: fact (3) implies that it must be a neighbor of the vertex on the polytope; 
moreover, for a linear program that is in general position with respect to t, there will be 
d such vertices. Thus, the method will be efficient provided that the shadow polygon does 
not have too many vertices. This is the motivation for the shadow vertex method. 

3.1 Formal Description 

Our description of the shadow vertex simplex method will be facilitated by the following 
definition: 

Definition 3.1.1 (opt Vert) Given vectors z, ai,...,a„ in IR"' and y G IRJ^, we define 
optVert2(ai, . . . , a„; y) to be the set of x solving 

■ ■ T 

maximize z x 

subject to {o,i\x) < yi, for 1 < i < n. 
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Figure 1: A shadow of a polytope 

If there are no such x, either because the program is unbounded or infeasible, we let 
optVert^(ai, . . . , a„; y) be 0. When ai,...,a„ and y are understood, we will use the 
notation optVert^. 

We note that, for Unear programs in general position, optVert^ will either be empty or 
contain one vertex. 

Using this definition, we will give a description of the shadow vertex method assuming 
that a vertex a^o and a vector t are known for which opt Vert ^ = xq. An algorithm that 
works without this assumption will be described in Section 3.3. Given t and z, we define 
objective functions interpolating between the two by 

= (1- A)i + Az. 

The shadow-vertex method will proceed by varying A from to 1, and tracking optVert^^. 
We will denote the vertices encountered by xq, xi, . . . , x/^, and we will set Aj so that Xi G 
optVertg^ for A G [Ai,Aj+i]. 

As our main motivation for presenting the primal algorithm is to develop intuition in 
the reader, we will not dwell on issues of degeneracy in its description. We will present a 
polar version of this algorithm with a proof of correctness in the next section. 
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primal shadow-vertex method 

Input: ai,...,a„, y, z, and Xq and t satisfying {xq} = 
optVertt(ai,...,a„;y). 

(1) Set Ao = 0, and i = 0. 

(2) Set Ai to be maximal such that {xq} = optVert^^ for A G [Aq, Ai]. 

(3) while Aj+i < 1, 

(a) Set i = i + 1. 

(b) Find an Xi for which there exists a Aj+i > Aj such that Xi G 
optVertq^ for A G [Ai,Ai+i]. If no such Xi exists, return un- 
bounded. 

(c) Let Aj+i be maximal such that Xi G optVert^^ for A G [Xi, 

(4) return Xi. 



Step (6) of this algorithm deserves further explanation. Assuming that the linear pro- 
gram is in general position with respect to t, each vertex Xi will have exactly d neighbors, 
and the vertex ccj+i will be one of these [BorSO, Lemma 1.3]. Thus, the algorithm can be 
described as a simplex method. While one could implement the method by examining these 
d vertices in turn, more efficient implementations are possible. For an efficient implemen- 
tation of this algorithm in tableau form, we point the reader to the exposition in [BorSO, 
Section 1.3]. 

3.2 Polar Description 

Following Borgwardt [BorSO], we will analyze the shadow vertex method from a polar per- 
spective. This polar perspective is natural provided that all yi > 0. In this section, we will 
describe a polar variant of the shadow-vertex method that works under this assumption. In 
the next section, wc will describe a two-phase shadow vertex method that uses this polar 
variant to solve linear programs with arbitrary j/jS. 

While it is not strictly necessary for the results in this paper, we remind the reader 
that for a polytope P = {x : {x\ai) < l,Vi}, the polar of P is {y : {x\y) < l,Va; G P}. 
An equivalent definition of the polar is ConvHull (0, ai, . . . , a^). We remark that P is 
bounded if and only if is in the interior of ConvHull (ai, . . . , o„). The polar motivates: 

Definition 3.2.1 (optSimp) For z and ai,...,a„ in TR'^ and y G IR", yi > 0, we let 
optSimp_j(ai, . . . , a^; y) denote the set of I G (j^^) such that Aj has full rank, A {{ai/yi)i^i) 
is a facet of ConvHull {0,ai/yi, ... ,an/yn) and z G Cone ((ai)ig/). When y is under- 
stood to be 1, we will use the notation optSimp2(oi, . . . , a„) When Ci, . . . , a„ and y are 
understood, we will use the notation optSimp^. 

We remark that for y, z and ai, . . . , in general position, optSimp2(oi, . . . , a„; y) 
will be the empty set or contain just one set of indices /. 

The following proposition follows from the duality theory of linear programming: 
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(a) 



(c) 



Figure 2: In example (a), optSimp = {{01,02,03}}. In example (b), optSimp 
{{ai, 02, 03} , {fl2) 03, 04}}- In example (c), optSimp = 0, 



Proposition 3.2.2 (Duality) For yi,...,y„ > 0, / G optSimp^ (ai /yi, o„/yn) if 
and only if there exists an x such that x £ optVert2(ai, . . . ,On', y) and {x\oi) = yi, for 
i G /. 

We now state the polar shadow vertex method. 



polar shadow-vertex method 
Input: 

• oi,.. .,On, z, and yi, . . . ,y„ > 0, 

• I e ('^^) and t satisfying / G optSimpi(ai/yi, . . . , On/yn)- 

(1) Set Ao = and i = 0. 

(2) Set Ai to be maximal such that for A G [Aq, Ai], 

/ G optSimpg^(oi/yi, . . . , On/yn)- 

(3) while Aj+i < 1, 

(a) Set i = i + 1. 

(b) Find a j and k for which there exists a Aj+i > Aj such that 

-^U {j} - {k} G optSimpg^(oi/yi,. . .,an/yn) 

for A G [Aj, Aj+i]. If no such j and k exist, return unbounded. 

(c) Set/ = /U{j}-{A:}. 

(d) Let Aj+i be maximal such that I G optSimp((ai/yi, . . . , On/yn) 
for A G [Ai, Aj+i]. 

(4) return /. 



The X optimizing the linear program, namely optVert2(ai, . . . , On', y), is given by the 
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equations (a;|Oj) = yi, for i E I. 

Borgwardt [BorSO, Lemma 1.9] establishes that such j and k can be found in step (b) if 
there exists an e for which optSimpg,^ ^ (ai/yi, . . . , an/Vn) ^- That the algorithm may 
conclude that the program is unbounded if a j and k cannot be found in step (6) follows 
from: 

Proposition 3.2.3 (Detecting unbounded programs) // there is an i and an e > 

such that Xi+e < 1 and optSimp (ci/yi, . . . , On/yn) = 0; i/iera optSimp (oi/j/i, . . . , a„/y„) = 

0. 

Proof optSimpg^^^^(ai/j/i, . . . ,a„/y„) = if and only if q^^^^ Cone (ai, . . . , a„). 
The proof now follows from the facts that Cone (oi, . . . , a^) is a convex set and ^A^+e is a 
positive multiple of a convex combination of t and z. ■ 

The running time of the shadow-vertex method is bounded by the number of vertices in 
shadow of the polytopc defined by the constraints of the linear program. Formally, this is 

Definition 3.2.4 (Shadow) For independent vectors t and z, ai, . . . ,a„ in TR!^ and y G 

H", y > 0, 

Shadowt,^ (oi,...,o„;y) =^ |J {optSimpq(ai/?/i, . . . , Cn/yn)} . 

q&Span{t,z) 

If y is understood to be 1, we will just write Shadowt^^ (oi, . . . , a„). 
3.3 Two-Phase Method 

We now describe a two-phase shadow vertex method that solves linear programs of form 
maximize 

subject to (fflilaj) < yi, iov 1 < i < n. (LP) 

There are three issues that we must resolve before we can apply the polar shadow vertex 
method as described in Section 3.2 to the solution of such programs: 

(1) the method must know a feasible vertex of the linear program, 

(2) the linear program might not even be feasible, and 

(3) some yi might be non-positive. 

The first two issues are standard motivations for two-phase methods, while the third is 
motivated by the polar perspective from which we prefer to analyze the shadow vertex 
method. We resolve these issues in two stages. We first relax the constraints of LP to 
construct a linear program LP' such that 

(a) the right-hand vector of the linear program is positive, and 

(6) we know a feasible vertex of the linear program. 
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After solving LP', we construct another linear program, LP^, in one higher dimension that 
interpolates between LP and LP'. LP~^ has properties (a) and (6), and we can use the 
shadow vertex method on to transform the solution to LP' into a solution of LP. 

Our two-phase method first chooses a d-set I to define the known feasible vertex of LP' . 
The linear program LP' is determined by A, z and the choice of /. However, the magnitude 
of the right-hand entries in LP' depends upon Smin [M)- To reduce the chance that these 
entries will need to be large, we examine several randomly chosen o?-sets, and use the one 
maximizing Smin- 

The algorithm then sets 

Ac = 2L's(s--(^^))J,and 

, f M for i G J 

' l\/dM^/4«; otherwise. 

These define the program LP': 
maximize 

subject to {0'i\x) < yl, for 1 < i < n. {LP') 

By Proposition 3.3.1, Aj is a feasible basis for LP', and optimizes any objective function 
of the form Ajct, for a > 0. Our two-phase algorithm will solve LP' by starting the polar 
shadow-vertex algorithm at the basis / and the objective function Aia for a randomly 
chosen oc satisfying ^ = 1 and > for all i. 

Proposition 3.3.1 (Initial simplex of LP') For any a > 0, I = optSimp^^^(ai, . . . , a 

Proof Let x' be the solution of the linear system 

{ai\x') = y'i, for i G /. 
By Definition 2.2.3 and Proposition 2.2.4 (a), 

\\x'\\ < \\y'j\\ \\AJ^\\ < MVd\\Aj^\\ = MVd/s^in{Ai) . 

So, for all ^ /, 

{ai\x') < (max||ai||)MVd/sniin {Ai) < M^Vd/An. 

Thus, for all i ^ I, 

{ai\x') < y'i, 

and, by Definition 3.2.1, / = optSimp^^„(ai, . . . , On', y')- ■ 

We will now define a linear program LP"*" that interpolates between LP' and LP. This 
linear program will contain an extra variable Xq and constraints of the form 

{ai\x) < (-^— ]yi+ \-^— J 
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and —l<xo<l- So, for xo = 1 we see the original program LP while for xq = — 1 we get 
LP'. Formally, we let 



r {{y[-y.i)/2,a^) for 1 < z < n 

< (1,0,..., 0) fori = 

[ (-1,0, ...,0) for i = -l 

r {y', + yi)/2 forl<f<n 

^ 1 fori = 

[ 1 for z = -1 

(1,0,... ,0), 



maximize (z'^|(a;o, a;)) 

subject to (^af\{xo, x)) < yf, for — 1 < i < n, {L,P^) 

and we set 

y+ = (yli,...,y+)- 

By Proposition 3.3.2, y/dM/An > 1, so y- > M and yf > 0, for all i. If LP is infeasible, 
then the solution to LP"*" will have xq < 1. If LP is feasible, then the solution to LP~^ will 
have form (1, x) where a; is a feasible point for LP. If we use the shadow-vertex method to 
solve LP"*" starting from the appropriate initial vector, then x will be an optimal solution 
to LP. 

Proposition 3.3.2 (relation of M and k) For M and k as set by the algorithm, VdM/An > 
1. 

Proof By definition, k < Smi^ (Aj). On the other hand, Smin 

(Ai) < \\Ai\\ < Vd max,- a,- , 

by Proposition 2.2.4 (d). Finally, M > 4maxj ||ai||. ■ 

We now state and prove the correctness of the two-phase shadow vertex method. 



I 



and we define LP^ by 
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two-phase shadow-vertex method 

Input: A = (d, . . . , a„), y, z. 

(1) Let I = {h,... jlsndinn} be a Collection of randomly chosen sets in 
('^^), and let / G X be the set maximizing Smin i-^i)- 

(2) Set M = 2r's(ma^ill!/i.«ill)l+2 and k = 2l-^s{Smm(^/))J _ 



(3) Set y\ 



I _ Jm for i G I 

1 \fdM'^ / Ak otherwise. 



(4) Choose a uniformly at random from {o: : = 1 and ai > l/d^}. 
Set t' = AiOL. 

(5) Let J be the output of the polar shadow vertex algorithm on LP' on 
input / and t' . If LP' is unbounded, then return unbounded. 

(6) Let C > be such that 

{-1}U J G optSimp(_^_^)(a+^/y+p...,a+/y+). 

(7) Let K be the output of the polar shadow vertex algorithm on LP^ on 

input {-i}u J, (-C,^;)- 

(8) Compute (xq, x) satisfying ((xq, x)\a'l) = yi for i e K. 

(9) If xo < 1, return infeasible. Otherwise, return x. 



The following propositions prove the correctness of the algorithm. 
Proposition 3.3.3 (Unbounded programs) The following are equivalent 

(a) LP is unbounded; 

(b) LP' is unbounded; 

(c) there exists a 1 > A > such that optSimp_>^(;^ q-)^(-]^_;^-)(_^^^-) (a^^, . . . , a+; y"*") = 0; 

(d) for alll> X>Q, optSimp;^(i o)+(i-a)(-c,z) ■ ■ ■ > «n i V^) = ^■ 
Proposition 3.3.4 (Bounded programs) If LP' is bounded and has solution J, then 

(a) there exists Co such that for all ( > (o, {—1} U J G optSimp(_^ 2) • ■ ■ ) O'n'i y^); 

(b) If LP is feasible, then for K' G optSimp2(oi, . . . , a^; y), there exists ^0 such that for 
alii > Co, {OjUK' G optSimp(^,j)(a+i,...,a+;y+), and 

(c) if we use the shadow vertex method to solve LP'^ starting from {—1, J} and objective 
function {—(, z), then the output of the algorithm will have form {0} U K' , where K' 
is a solution to LP. 
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Proof of Proposition 3.3.3 LP is unbounded if and only if there exists a vector v 
such that {z\v) > and {ai\v) < for all i. The same holds for LP', and establishes the 
equivalence of (a) and (b). To show that (a) or (6) implies (c?), observe 

(A(l, 0) + (1 - A)(-C, 2^)1(0, v)) = (1 - A) {z\v) > 0, (8) 
{a+\{0,v)) = {ai\v), ioii = l,...,n, (9) 
<a+|(0,t;)) =0, and 
{a+,\{0,v))=0. 

To show that (c) implies (a) and (b), note that and al^ are arranged so that if for some 
vq we have 

{a+\{vo,v)) < 0, for -1 < i < n, 
then vo = 0. This identity allows us to apply (8) and (9) to show (c) implies (a) and (6). ■ 

Proof of Proposition 3.3.4 Let J be the solution to LP' and let x' = Aj^y'j be the 
corresponding vertex. We then have 

(aj'lfflj) = yj, for i E J, and 

(a3'|ai><yi, ioii^J. 

Therefore, it is clear that 

((-l,a;')|a+> = y+, for i G {-1} U J, and 

<(-l,a;')|a+)<y+ for i ^ {-1} U J. 

Thus, A (aj;;^, (a^)igj) is a facet of To see that there exists a ("o such that it 

optimizes {—(, z) for all C, > (o, first observe that there exist aj > 0, for i e J, such that 
Eiej "jfli = •s- Now, let (-Co, z) = Y^iej o^i^-t ■ For C > Co, we have 

(-C,z) = (C-Co)aii + ^aia+, 

which proves (— C, 2) £ Cone (alj^, (a^)igj) and completes the proof of (a). 
The proof of {b) is similar. 

To prove part (c), let be as in step (7). Then, there exists a A^ such that for all 
AG (Ajt,l), 

K = optSimp(i_;^)(_^^^)+;,^+ [ati, ■■■,a+; y+) . 
Let (xo, a;) satisfy ((xq, x)\a'l) = y^, for i & K. Then, by Proposition 3.2.2, 

(xo, x) = optVert(^i_x){-c,z)+Xz+ • • • , ; V^) ■ 

If xo < 1, then LP was infeasible. Otherwise, let x* = optVert2(ai, . . . , an, y). By part 
(6), there exists ^0 such that for all ^ > Co, 

(1, X*) = optVert/^^^) (all, 2/^)- 
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For ^ = -( + A/(l - A), we have 

i^,z) = ^{{l-X)i-C,z) + Xz+). 

So, as A approaches 1, ^ = — C + ^/{^ ~ ^) goes to infinity and we have 

optVert(i_;^)(_^^^)+;,^+ {ati, • • • , a+; y+) = optVert(^^^) {at^, . . . , a+; y+), 

which impUes {xo,x) = {l,x*). ■ 

Finahy, we bound the number of steps taken in step (7) by the shadow size of a related 
polytope: 

Lemma 3.3.5 (Shadow path of LP~^) Let al^, . . . , and yl^, ...,?/+ be as defined in 
LP^ . Let C > &e such that {—1} U J = optSimp(_^ j,-) (alj^/yl^, . . . , a^/y:^). Then the 
number of simplex steps made by the polar shadow vertex algorithm while solving LP'^ from 
initial basis {— 1} U J and vector (— ^, z) is at most 

2 + |Shadow(o,^),^+ {a^/yf, • • • , at/Vn) \ ■ 

Proof We wiU estabhsh that {—1} G / for the first step only. One can similarly prove 
that {0} G / is only true at termination. 

Let L G optSimpg^ (al^^/ylj^, . . . , a^/y+) have form {— 1} U L. As Qq = a^^ G 
Cone (A|_i|u2.) , and Cone [A^_ijtjij is a convex set, we have qy € Cone (A{_i}uj;^) 
for all < A' < A. As [Aj, Aj+i] is exactly the set of A optimized by A (Aj) in the zth step 
of the polar shadow vertex method, / must be the initial set. 



3.4 Discussion 

We also note that our analysis of the two-phase algorithm actually takes advantage of the 
fact that K and M have been set to powers of two. In particular, this fact is used to show 
that there are not too many likely choices for k and M. For the reader who would like to 
drop this condition, we briefly explain how the argument of Section 5 could be modified to 
compensate: first, we could consider setting k and M to powers of 1 + l/poly{n,d,l/a). 
This would still result in a polynomially bounded number of choices for k and M. One 
could then drop this assumption by observing that allowing k and M to vary in a small 
range would not introduce too much dependency between the variables. 
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4 Shadow Size 

In this section, we bound the expected size of the shadow of the perturbation of a polytope 
onto a fixed plane. This is the main geometric result of the paper. The algorithmic results 
of this paper will rely on extensions of this theorem derived in Section 4.3. 

Theorem 4.0.1 (Shadow Size) Let d>3 and n> d. Let z and t be independent vectors 
in JR!^, and let fii, . . . , fin be Gaussian distributions in M'^ of standard deviation a centered 
at points each of norm at most 1 . Then, 

E [|Shadowt,^(ai,...,a„)|] <P(n,d,(7), (10) 

ai,...,On 

where 

, , 58, 888, 678 nd^ 
T>(n,d,a) = g, 

min ^0", l/SV din ri^ 

and ai, . . . ,an have density HILi MiC^i)- 

The proof of Theorem 4.0.1, will use the following definitions. 
Definition 4.0.2 (ang) For a vector q and a set S, we define 

ang {q, S) = min angle {q, x) , 

If S is empty, we set ang (g, 0) = oo. 

Definition 4.0.3 (ang^) For a vector q and points Ci, . . . , a„ in IR"^, we define 

angg (ai,...,an) = a.ng{q,d A (optSimpg(ai, . . . , Cn))) , 
where 9A(optSimpq(oi, . . . , o„)) denotes the boundary of the simplex A (optSimpg(ai, . . . , 

These definitions are arranged so that if the ray through q does not pierce the convex 
hull of Oi, . . . , On, then ang^ (ai, . . . , a„) = oo. 

In our proofs, we will make frequent use of the fact that it is very unlikely that a 
Gaussian random variable is far from its mean. To capture this fact, we define: 

Definition 4.0.4 (P) P is the set of (oi, . . . , a„) for which \\ai\\ < 2, for all i. 

Applying a union bound to Corollary 2.4.6, we obtain 

Proposition 4.0.5 (Measure of P) 

Pr[(ai, . . . , a„) G P] > 1 - n{n-'^-^'^) = 1 - n'^-^'^+K 
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Proof of Theorem 4.0.1 We first observe that we can assume a nn: if 

a > l/^Vdlnn, then we can scale down all the data until a = l/3\/dTnn. As this could 
only decrease the norms of the centers of the distributions, the theorem statement would 
be unaffected. 

Assume without loss of generality that z and t are orthogonal. Let 



Qe = z sm.{9) + t cos{9). 



(11) 



We discretize the problem by using the intuitively obvious fact, which we prove as Lemma 4.0.6, 
that the left-hand of (10) equals 



lim E 

m— *oo ai,...,On 



IJ |optSimpg^(ai,...,a„)| 



Let Ei denote the event 

optSimpg^ .^^(ai, . . . , On) 7^ optSimp 
Then, for any m > 2 and for all Oi, . . . , On, 



927r((i+l) mod m)/j 



(Cl, . . . , o„] 



IJ |optSimpg^(ai,...,a„)} 

flc/ 2vr 2-27r m-27r 1 
'^^l m ' m m 5 

We bound this sum by 



Y^Ei{ai,...,an). 



i=l 



E 



E 

p 



< E 
p 



< E 
p 



E^^ 



Pr [P] + E 



E^^ 



Pr P 



+ 1 



Thus, we will focus on bounding Ep [Y^- Ei]. 

Observing that Ei implies ang^^^,^^ (ci, . . . , a„) < 27r/m , and applying linearity of 
expectation, we obtain 



E 

p 



E^^ 



EPr[^. 



•4 = 1 

m 



SEP/ 



i=l 



< 27r 



9,372,424 nd^ 



by Lemma 4.0.7, 



< 



58, 888, 677 nd^ 



(7^ 
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Lemma 4.0.6 (Discretization in limit) Let z and t he orthogonal vectors in IR'^, and 
let Hi, . . . , Hn be non- degenerate Gaussian distributions. Then, 



E 

ai,...,On 



IJ {optSimpg(ai,...,a„)} 

q&Span{z,t) 



lim E 

m—^OO Ol,...,On 



IJ |optSimpg^(ai,...,a„)| 



(12) 



optSimpg^(ai, . . . , Cn) = / 



de. 



where qg is as defined in (11). 
Proof For a/G (tj), let 

Fi{ai,...,an) = / 

Je 

The left and right hand sides of (12) can differ only if there exists a 6 > such that for all 
e > 0, 

'^^ I = optSimpg^(ai, . . . , Cn) for some 6, and " ^ ^ 
ai,...,o„ [ F/(Oi, . . . , o„) < e J 

\ - there are only finitely many choices for I, this would imply the existence of a S' and a 
particular / such that for all e > 0, 



Pr 

ai,...,ar, 



/ = optSimpq^ (ci, . . . , On) for some 9, and 
F/(ai,...,a„) < e 



>S'. 



As Fi{ai, . . . , an) = Fj{Ai) given that I = optSimpg^(oi, . . . , a„) for some 0, this implies 
that for all e > 0, 



Pr 

ai ,...,On 



/ = optSimpg^(^/) for some 6, and 
FiiAi) < e 



>S'. 



(13) 



Note that I = optSimpg^(^/) if and only if qg G Cone {Aj). Now, let 

G{Ai) = f[qee Cone {Ai)] (ang {qg, 8 A {Ai)) /tt) de . 
Je 

As G{Ai) < Fi{Aj), (13) implies that for all e > 



Pr 



/ = optSimpg^(ai, . . . , a„) for some 9, and 
GiAi) < e 



>5'. 



However, G is a continuous function, and therefore measurable, so this would imply 



Pr 



/ = optSimpq^(ai, . . . , a„) for some 9, and 
GiAi) = 



which is clearly false as the set of Aj satisfying 
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• G{Ai) = 0, and 

• 39 -.optSimpg^iai,..., On) = {Aj} 

has co-dimension 1, and so has measure zero under the product distribution of non-degenerate 
Gaussians. ■ 



Lemma 4.0.7 (Angle bound) Let d > 3 and n > d. Let q he any unit vector and let 
lii,...,l^n be Gaussian measures in M*^ of standard deviation a < 1/ SVdlnn centered at 
points of norm at most 1. Then, 

„ r / N n 9, 372, 424 nd^ 
Pr [angq{ai, . . . , On) < ej < s e 



a" 



where ai, . . . , a„ have density 



i=l 



The proof will make use of the following definition: 

Definition 4.0.8 (P/) For a I e ff) and j G /; we define Pj to be the set o/ Ci, . . . , 

satisfying 

(1) For all q, i/ optSimp^(ai, . . . , a„) ^ 0, then s <2, where s is the real number for 
which sq & A (optSimpg(ai, . . . , o„)), 

(2) dist {oi, Uk) <4, fori,keI- {j}, 

(3) dist (cj, Aff <4, and 

(4) dist (^aj-,ai^ < A, for alii G I — where aj- is the orthogonal projection of aj onto 
Aff(A/-0}). 

Proposition 4.0.9 (P C P/) For allj,I, P C P} . 

Proof Parts (2), (3), and (4) follow immediately from the restrictions ||ai|| < 2. To see 
why part (1) is true, note that sq lies in the convex hull of ai, . . . , a„, and so its norm, s, 
can be at most maxj ||aj|| < 2, for (ai, . . . , a„) G P. ■ 

Proof of Lemma 4.0.7 Applying a union bound twice, we write 
Pr [ang^(ai,...,a„) < e] 



optSimpg(ai, . . . , a„) = / and 
SLng{q,dA{Ai))<e 



I j=i 

d 



optSimpq(oi, . . . , an) = I and 
ang(qf, A < e 

optSimp (ffli, . . . , an) = I and 



< W Pr 

^i^i H L ang(g, A < e 



Pr [P] 



Pi 
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(by Proposition 2.3.2) 



< 



I 3=1 



optSimpg(ai, . . . , a„) = I and 
ang(g, A < e 



Pr [PI 



(by P C Pj) 



< 



1 - n-2-M+i 

I j=i 



(by Proposition 4.0.5) 



< 



optSimpg(ai, . . . , a„) = / and 
ang(q, A (Aj_{^})) < e 



optSimp (oi; .... a„) = I and 



1 _ „-2.M+i pi [ ang(g, A (Aj_{,})) < e, 



by changing the order of summation. 

We now expand the inner summation using Bayes' rule to get 



>p r optSimpg(ai, . . . , a„) = / and 
^ pJ [ ang(g, A (Aj_{j})) < e 

= H [optSimpq(ai, ...,«„)=/] 



(14) 



Pr 

Pi . 



ang(g, A (^j_|j})) < e| 

optSimpq(ai, . . . , a„) = I 



As optSimpg(ai, . . . , o„) is a set of size zero or one with probabihty 1, 

Pr [optSimpg(ai, . . . , On) = /] < 1; 



from which we derive 

^ Pr [optSimpg(ai, ...,«„)=/] 



/ Pi 



< 5^ Pr [optSimp,(ai, . . . , a„) = /] / Pr 



Pi 



(by Proposition 2.3.2) 



< 



1 _ ^-2.9d+l 



^Pr [optSimp^(ai, ...,an) = l] 



(by P G Pj and Proposition 4.0.5) 



< 



1 

1 _ ^-2.9rf+l 
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So, 



(14) < Y 



T .maxPr 



optSimpg(oi, . . . , On) = / 



Plugging this bound in to the first inequahty derived in the proof, we obtain the bound of 



Pr [angg(oi,...,an) < e] 



< 



maxPr 



(1 _ n-2.9<i+l)2 



optSimpg(ai, ...,«„) = / 



9 372 424 ncf 

< d— ^— g e, by Lemma 4.0.11, d > 3 and n > d + 1, 

9,372,424 nd^ 



Definition 4.0.10 (Q) We define Q to be the set of (6i, . . . , 6^) G M*^ ^ satisfying 

(1) dist(6i,Afr(62,...,f'd)) <4, 

(2) dist (6,, hj) < 4 for all i,j > 2, 

(S) dist (6]^, 6 j) < 4 for all i > 2, where bj^ is the orthogonal projection of bi onto 
Aff (62, • • • , bd), and 

U) OG A(6i,...,6d). 

Lemma 4.0.11 (Angle bound given optSimp) Let iii,... ,fin be Gaussian measures in 
IR'^ of standard deviation a < l/3\/dTnn centered at points of norm at most 1. Then 



Pr 

l,...,d L 



ang{q,A{a2,...,ad)) < e\ 

optSimpq(oi,...,o„) = {l,...,d} 



< 



9, 371, 990 nd'^e 



(15) 



where ai, . . . , a„ have density 



1=1 



Proof We begin by making the change of variables from ai, . . . , to w, s, 61, . . . , 6^ 
described in Corollary 2.5.3, and we recall that the Jacobian of this change of variables is 

(d-l)!(a;|g)Vol(A(6i,...,6d)). 

As this change of variables is arranged so that sq G A (ai, . . . , aa) if and only if G 
A (61, . . . , bd), the condition that optSimpg(ai, . . . , a„) = {1, . . . , d} can be expressed as 

[0 G A (61, . . . , bd)] n Mf'j) < {^\^q)] ■ 

3>d 
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Let X be any point on A (02, • • • , Od). Given that sg G A (ci, . . . , a^), conditions (3) 
and (4) for membership in ^ imply that 

dist (sg, x) < dist (ci, x) < ^ dist (ai, Aff (02, . . . , Od))^ + dist [a^, x)^ < 4\/2, 
where aj; is the orthogonal projection of Oi onto AfF (02, . . . , ffld). So, Lemma 4.0.12 implies 

dist (sg, Aff (02, . . . , Od)) {(x>\q) dist (0, Aff (62, • • • , ^d)) (w|g) 



ang(g,A(a2,...,ad)) > 



2 + 4^2 2 + 4:V2 



Finally, observe that (ai, . . . , Od) € ^ is equivalent to the conditions (bi, . . . , fed) £ 
Q and s < 2, given that optSimp^(ai, . . . , Cd) = {!,••• ,d}. Now, the left-hand side of 
(15) can be bounded by 



Pr 

w,s<2 

(6i,...,bd)eQ 



dist (0,Aff(62,...,fed)2H9) 



2 + 4^2 

where the variables have density proportional to 



(16) 



(a;|g) Vol(A(6i,...,6d)) (jj / [{i^\aj) < s {u\q)] fij{aj) daA Y[ni{R^bi + sq). 
As Lemma 4.1.1 implies 



Pr [dist (0, Aff (62,..., fed)) <e] < 

(6i,...,6d)eQ 



and Lemma 4.2.1 implies 

/340ne\^ 



max Pr[(u;|g) < e] < 

s<2,bi, -,bd&Q " 



we can apply Lemma 2.3.5 to prove 



. . /900e2/3d2\ /34o„\ 9, 371, 990 nd^e 



Lemma 4.0.12 (Division into distance and angle) Let x be a vector, let < s < 2, 
and let q and lo be unit vectors satisfying 

(a) {u}\x — sq) = 0, and 

(b) dist(a;,sq) < 4^2. 
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Then, 



angle {q, x) > 



dist (a;, sq) (cj|g) 
2 + 4^2 



Proof Let r = x — sq. Then, (a) implies 



Q) < Ikll = 1; 



so, 

{r\q) < {u\qf \\r\\ . 

Let h be the distance from x to the ray through q. Then, 



h^ + {r\qY = \\rf; 



so. 



Now, 



h > {u}\q) \\r\\ = (u;|q) dist {x, sq) 



angle (g, x) > sin(angle {q, x)) 



h 



> 



h 



> 



h 



> 



{i^\q) dist {x, sq) 



x\\ s + dist {x,sq) 2 + 4^2 2 + 4^2 



4.1 Distance 

The goal of this section is to prove it is unlikely that is near (9 A (6i, . . . , bd). 




Figure 3: The change of variables in Lemma 4.1.2. 
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Lemma 4.1.1 (Distance bound) Let q be a unit vector and let fj,i, . . . , fj,n be Gaussian 
measures in M'^ of standard deviation a < l/SVd\nn centered at points of norm at most 1. 
Then, 

90062/3-72 

Pr^ [dist (0, AfF(62, . . . , 6<i)) < e] < ^ , (17) 



(6i,...,6d)Gg 



where the variables have density proportional to 



(a;|g) Vol(A(6i,...,6d)) JJ / [{uj\aj) < s {u;\q)] fij{aj) daj \ Ylni{R^bi + sq). 

Proof Note that if we fix u: and s, then the first and third terms in the density become 
constant. For any fixed plane specified by (w, s), Proposition 2.4.3 tells us that the induced 
density on 6, remains a Gaussian of standard deviation a and is centered at the projection 
of the center of onto the plane. As the origin of this plane is the point sq, and s < 2, 
these induced Gaussians have centers of norm at most 3. Thus, we can use Lemma 4.1.2 to 
bound the left-hand side of (17) by 

max Pr [dist (0, Aff (62, . . . , 6^)) < e] < r • 

^,s<2{bi,...,ba)eQ o'^ 



Lemma 4.1.2 (Distance bound in plane) Let/ii, . . . ,iid be Gaussian measures mlR 
of standard deviation a < l/3Vd\nn centered at points of norm at most 3. Then 

Q00p2/3w2f 

Pr [dist (0, Aff (62,..., 6<i))<e]< ^ , (18) 

where bi, . . . ,bd have density proportional to 

d 

Vol(A(6i,...,6d))[]//i(6,). 

i=l 

Proof In Lemma 4.1.3, we will prove it is unlikely that 61 is close to Aff (62, • • • , b^)- 
We will exploit this fact by proving that it is unlikely that is much closer than 61 to 
Aff (b2, • • • , bd)- We do this by fixing the shape of A (&i, . . . , 6^), and then considering 
slight translations of this simplex. That is, we make a change of variables to 

d 

di = h — bi, for i>2. 

The vectors d2, • • • , specify the shape of the simplex, and h specifies its location. As this 
change of variables is a linear transformation, its Jacobian is constant. For convenience, we 
also define di = h — b\ = — X]j>2 di. 
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It is easy to verify that 

G A(6i,...,6rf) ^ h£ A{di,...,dd), 

dist(0,Aff(62,...,M) = dist(/i,Aff(d2,...,dd)), 
dist (61, Aff (&2, . . . , 6d)) = dist (di, Aff (d2, • • • , and 
Vol (A (61,..., 6,)) = Vol(A(di,...,d,)). 

Note that the relation between di and d2, ■ ■ ■ ,dci guarantees G A (di, . . . , d^) for all 
d2, - ■ ■ , dd- So, (61, . . . , bd) G Q if and only if (di, . . . , d^) G Q and /i G A (di, . . . , d^). As 
di is a function of d2, . . . , da, we let Q' be the set of d2, . . . , da for which (di, . . . , da) G Q. 
So, the left-hand side of (18) equals 

Pr [dist(/i,Aff(d2,...,dd)) < e] 
{d2,...,dd)eQ' 

heA{di,...,dd) 

where /i, d2, . . . , d^ have density proportional to 

d 

\o\iA{di,...,dd))l[^ii{h-di). (19) 

i=l 

Similarly, Lemma 4.1.3 can be seen to imply 

Pr ,[dist(di,AfF(d2,...,dd))<e]< —] < — ^ (20) 

heA{di,...,dd) \ / \ / 



under density proportional to (19). We take advantage of (20) by proving 

dist(/i,Aff'(d2,...,dd)) 



max Pr » rr./ , i \n 

d2,...,da&Q' heA(di,...,dd) [dist (di, Aff (d2, . . .,dd)) 

where h has density proportional to 

d 



< € 



< -IT^ (21) 



Ylni{h - di 



i=l 

Before proving (21), we point out that using Lemma 2.3.5 to combine (20) and (21), we 
obtain 

^ r /. . rr./ , , ^ M 900e^/^d'^e 
Pr [dist(/^,Aff(d2,...,dd) <e)] < 5 , 

{d2,...,da)eQ' (7^ 

heA{di,...,dd) 

from which the lemma follows. 
To prove (21), we let 

TT ShaA(r1 . dist(fe,Aff(d2,...,d,)) ^ 
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and we set u{h) = Y[i=i l^i{h — di). Under this notation, the probabihty in (21) is equal to 

(z.(C/o) - v{U,))/v{Uo). 

To bound this ratio, we construct an isomorphism from Uq to U^. The natural isomorphism, 
which we denote is the map that contracts the simplex by a factor of (1 — e) at di. 
To use this isomorphism to compare the measures of the sets, we use the facts that for 
di, . . . ,d(j G Q and h e A (di, . . . ,dd), 

(a) — dill < maxjj ||di — dj|| < 4\/2, so the distance from h — di to the center of its 
distribution is at most ||/i — di|| + 3 < 4-\/2 + 3; 

(b) dist (/i, ^>e(^)) !^ emaxj dist (di,di) < 4^26 

to apply Lemma 2.4.2 to show that for all h G A (di, . . . , d^), 



Hii^ejh) - dj 
ljLi{h — di) 



> e 



3-4v^(4V2+3)e 



= e 



(48+18y2)e 



So, 



mm 



Ki>.(/i)) 



mm 



heA{di,...,da) v{h) heA(di,...,da)fJ^ iii{h-di) 
As the Jacobian 



n 

i= 



p.i{^,{h) - di) ^ _(48+i8v^)de (48 + 18V2)de 



> e 



> 1-- 



(22) 



dh 



= {l-e)'^>l- de, 



using the change of variables x = $e(^) we can compute 

d<^,{h) 



u{Ue)= [ u{x)dx = / i^iMh)) dh >{l-de) [ iy{^e{h))dh. (23) 



So, 



l^iUe) ^ 0_- de) J^^uo ''i^eih)) dh 



y{Uo) 



by (23) 



> (1 - de) 



niin I — 

/ieA(di,...,dd) v{h) J J, 



heUo 



dh 



> (1 - de) yi - 
_ 75de 

^ J- 7i 1 



(48 + 18V2)de \ 



by (22) 
as cr < 1. 



(21) now follows from {iy{Uo) - iy{Ue))/iy{Uo) < ^ 



75de 



48 




Figure 4: The change of variables in Lemma 4.1.3. 

Lemma 4.1.3 (Height of simplex) Let ni, . . . , /i^ be Gaussian measures in TR'^~^ of stan- 
dard deviation a < l/3Vd\nn centered at points of norm at most 3. Then 

Pr [dist(fai,Aff(b2,...,M <e)] < 

bi,...,bd&Q 

where 6i, . . . , 6^ have density proportional to 

d 

Yol{A{bu...,bd))Ylfii{bi). 

1=1 

Proof We begin with a simpHfying change of variables. As in Theorem 2.5.2, we let 

(62, ...,bd) = {RtC2 + tT,..., R-rCd + tr) , 

where r G and t > specify the plane through 62, • • • , &d, and C2, . . . , G R*^"^ 

denote the local coordinates of these points on that plane. Recall that the Jacobian of 
this change of variables is Vol (A (c2, . . . , c^)). Let I = — (t|6i), and let C\ denote the 
coordinates in M'^^^ of the projection of 61 onto the plane specified by r and t. Note that 
Z > 0. In this notation, we have 

dist(6i,Aff(62,...,&d)) = ^ + ^- 

The Jacobian of the change from bi to Ci) is 1 as the transformation is just an orthogonal 
change of coordinates. The conditions for (&i, . . . , b^j G Q translate into the conditions 



3ee2/3d\ 



(72 
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(a) dist (cj, Cj) < 4 for all i ^ j; 

(b) {l + t)< 4; and 

(c) 0€ A(6i,...,6d). 

Let R denote the set of ci, . . . , satisfying the first condition. As the lemma is vacuously 
true for e > 4, we will drop the second condition and note that doing so cannot decrease 
the probability that {t + I) < e. Thus, our goal is to bound 

Pr [{l + t)<e], (24) 
T,t,i,{ci,...,cd.)eR 

where the variables have density proportional to^ 

d 

[0 G A (6i, . . . , bd)] Vol (A (6i, . . . , bd)) Vol (A (c2, . . . , c^)) J] l^i(bi). 

i=l 

As Vol (A (6i, . . . , bd)) = {I + t)Vol (A (ci, . . . , Cd)) /d, this is the same as having density 
proportional to 

d 

{l + t)[oeA (6i, . . . , bd)] Vol (A (C2, . . . , cd))^ n 

i=l 

Under a suitable system of coordinates, we can express bi = {—I, ci) and bi = {t, Ci) for 
i>2. The key idea of this proof is that multiplying the first coordinates of these points by a 
constant does not change whether or not G A . . . , 6^); so, we can determine whether 
G A {bi, . . . , bd) from the data (l/t, C\, . . . , c^). Thus, wc will introduce a new variable 
a, set / = at, and let S denote the set of (a, Ci, . . . , Cd) for which G A . . . , bd) and 
{ci, . . . , Cd) e R. This change of variables from I to cy incurs a Jacobian of ^ — t, so (24) 
equals 

Pr [(1 + a)t < e] , 

T,t,{a,ci,...,Cd)€S 

where the variables have density proportional to 

d 

t'^il + a)Vol (A (C2, . . . , Cd))"^ /xi(-at, ci) JJ/Xi(t, Ci). 

1=2 

We upper bound this probability by 

max Pr [(1 + a)t < e] < max Pr [max(l, a)t < e] , 

x,(a,ci,...,Cd)eS t T,{a,ci,...,Cd)&S t 

where t has density proportional to 

d 

t^Hi{-at, Ci)Y[l^i{t, Ci). 

i=2 

^While we keep terms such as bi in the expression of the density, they should be interpreted as functions 

of T, t, l,Cl,. . . , Cd- 
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For ci, . . . ,Cd fixed, the points {—at, Ci), {t, C2), • • • , (i, Cd) become univariate Gaussians of 
standard deviation a and mean of absolute value at most 3. Let to = a'^/{3m.ax{l,a)d). 
Then, for t in the range [0, t^], —at is at most 3 + ato from the mean of the first distribution 
and t is at most 3 + io from the means of the other distributions. We will now observe that 
if t is restricted to a sufficiently small domain, then the densities of these Gaussians will 
have bounded variation. In particular. Lemma 2.4.2 implies that 



\i=2 / \i=2 



\i=2 J \i=2 



Thus, we can now apply Lemma 2.3.7 to show that 



Pr [t<e]< e 



2 ( 3e(max(l, Q;)d^ ^ 



from which we conclude 



Pr [max(l, a)t < e] < 



t \ a 



2 



4.2 Angle of g to 

Lemma 4.2.1 (Angle of incidence) Let d > 3 and n > d. Let jii, . . . , jin be Gaussian 
densities in H*^ of standard deviation a centered at points of norm at most 1 in M'^. Let 
s <2 and let (61, . . . , 6^) G Q. Then, 

T, r/ , ^ 1 /340€n\^ 

Pr[{u;\q) < e] < [-^) > (25) 
where co has density proportional to 
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Proof First note that the conditions for (&i, . . . , bd) to be in Q imply that for 1 < z < d, 

bj has norm at most ^(4)2 + (4)2 = 4^2 by properties (1), (3) and (4) of Q. 

As in Proposition 2.5.4, we change a; to {c,tp), where c = {u}\q) and tp G S'^~^.The 
Jacobian of this change of variables is 

(l_c2)M/2_ 

In these variables, the bound follows from Lemma 4.2.2. ■ 



Lemma 4.2.2 (Angle of incidence, II) Let d > 3 and n > d. Let fid+i, ■ ■ ■ , fJ-n be Gaus- 
sian densities in TR'^ of standard deviation a centered at points of norm at most 1 in M*^. 
Let s <2, and let bi, . . . , bd each have norm at most 4\/2. Let G >S"^~^. Then 



/340en\^ 



Pr[c<e]<i^—^j 
where c has density proportional to 



(l_c2)M/2.c. m / [{u:^^,\aj) <s{uj^,c\q)]Hi<^j)daj ) HfniR^^Ji + sq) (26) 



Proof Let 



^l(c) = (l-c2)M/^ 

z^2(c) = n / K^V-.claj) < s{u;^^c\Q)]lJ'jiaj)daj , and 

d 

J^3(c) = Ylni{R^^,^bi + sq). 

i=l 

Then, the density of c is proportional to 

(26) = c - z^i(c)z/2(c)f3(c). 

Let ^ 

Co = ^— . (27) 

240n ^ ' 

We will show that, for c between and cq, the density will vary by a factor no greater than 
2. We begin by letting = 7i"/2 — arccos(co), and noticing that a simple plot of the arccos 
function reveals cq < 1/26 implies 

6*0 < l.OOlcQ. (28) 

So, as c varies in the range [0, cq], t^tp^c travels in an arc of angle at most and therefore 
travels a distance at most ^o- As c = {q\u^^c), we can apply Lemma 4.2.3 to show 

mino<e<co;^2(c) ^ ^ _ 8n(l + s)go ^ ^ _ 24ngo ^ ^ _ LOOl 
maxo<c<co ^2 (c) ~ 3(t2 ~ 3(t2 ~ 30 ' 
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by (27) and (28). 

We similarly note that as c varies between and cq, the point ^bi + sq moves a 
distance of at most 

do\\bi\\ <4V2eo. 

As this point is at distance at most 

l + s + \\bi\\ <4V2 + 3 
from the center of /i^, Lemma 2.4.2 implies 

mino<c<co liiiRui^ hj + sq) ^ ^_(^3(4V2+3)4^eo)/2o-2 > ^-14750/0-2 

maXo<c<coMi(-Ru;^,c*'i + «9) ~ ~ 

So, 

mino<c<co l^zjc) ^ ^-147^^0/^2 > ^-148/240 /gQ') 

max;o<c<co ^2,{c) ~ ~ 

by (27) and (28) and d<n. 
Finally, we note that 



1 > (c) = (1 - c2)(<^-3)/2 > (1 _ l/26d)(<^-3)/2 > ^1 - -L^ . 

So, combining equations (29), (30), and (31), we obtain 

mino<c<c„ vi{c)v2{c)v^{c) ^ L _ ]A ^-if L _ 1-001 \ > 



(31) 



maxo<c<co J^i(c)i^2(c)i.3(c) \ h2 J \ 30 

We conclude by using Lemma 2.3.7 to show 

/240en\^ /340en\^ 



Pr [c < e] < 2(e/co)^ = 2 



c 



Lemma 4.2.3 (Points under plane) For n > d, let fid+i^ • • • ) A*rt be Gaussian distribu- 
tions in IR'^ of standard deviation a centered at points of norm at most 1. Let s > and let 
uji and CO 2 be unit vectors such that {<^i\q) and (0^2! g) are non-negative. Then, 

Uj>dL, [(^ski) < s {U2\q)] fijiaj) daj ^ ^ + s) ||a;i - a;2|| 



llj>d [('^i <s{^i\ q)] H (flj) daj 3(72 

Proof As the integrals in the statement of the lemma are just the integrals of Gaussian 
measures over half-spaces, they can be reduced to univariate integrals. If jij is centered at 
Cij, then 

d 
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(setting = Cj — aj) 

v27r(T 7t=-oo 



27rc7 7t= 



(by Proposition 2.4.4) 

yft=0O 
t=-{a)i|sQ-aj> 

As ||aj|| < 1, we know 

— (a;i|sg — dj) = — (a;i|sg) + (a;i|dj) < (a;i|dj) < 1 (32) 

Similarly, 

I - {u}i\sq — Cij) + (a;2|sg - dj) | = | - (wi - ijJ2\sq - aj) \ 

< \\uJi — U2\\ \\sq — djil (33) 

< ||a;i-a;2||(s + l). (34) 

Thus, by applying Lemma 2.4.11 to (32) and (34), we obtain 

J,.[{u;i\a,)<s{iVi\q)]^j{aj)daj ~ //r!^^ e-tV2<x^ rft . 



> 



_ 8(l + s)||a;i -u>2| 

3(72 



Thus, 

llj>dla^ [(^2|Qj) < s {uj2\q)] Hjjaj) da, ^ f_ 8(1 + g) ||a;i - u;2|| \"~'^ 



8ra(l + g)||i^i-a;2| 

- ' 3c72 



4.3 Extending the shadow bound 

In this section, we relax the restrictions made in the statement of Theorem 4.0.1. The 
extensions of Theorem 4.0.1 are needed in the proof of Theorem 5.0.1. 

We begin by removing the restrictions on where the distributions are centered in the 
shadow bound. 
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Corollary 4.3.1 free) Let z and t he unit vectors and let ai,...,a„ he Gaussian 

random vectors in M'^ of standard deviation a < l/SVd\nn centered at points ai, . . . , a„. 
Then, 

E [Shadow^,* (oi, . . . , <V[d,n, tt ) 

\ m£Lx(l,maxj ||a||)y 

where V{d,n,a) is as given in Theorem 4-0.1. 

Proof Let k = maxj Assume without loss of generality that k > 1, and let &j = Ui/k 

for all i. Then, 6, is a Gaussian random variable of standard deviation (a/k) centered at a 
point of norm at most 1. So, Theorem 4.0.1 implies 



E [Shadow^,* (&i, . . . , bn)] < V (d, n, |) . 



On the other hand, the shadow of the polytope defined by the b^s can be seen to be a dilation 
of the polytope defined by the OiS: the division of the biS by a factor of k is equivalent to 
the multiplication of x by k. So, we may conclude that for all ai, . . . , a„, 

|ShadoW;j,t (ai, . . . ,a„)| = IShadoW;^,* (6i, . . . , 6„)| . 



Corollary 4.3.2 (Gaussians free) Let z and t he unit vectors and let ai, . . . ,an be Gaus- 
sian random vectors in JR"^ with covariance matrices Mi, . . . , iW„ centered at points ai, . . . ,d 
respectively. If the eigenvalues of each lie hetween and l/Qdlnn, then 

^;[Shadow^,t {ai,...,an)]<v(d,n ] +1 

where V{d,n,a) is as given in Theorem 4.0.1. 
Proof By Proposition 2.4.1, each Oj can be expressed as 

Oi = «i + 5i + 9i, 

where is a Gaussian random vector of standard deviation a centered at the origin and 
is a Gaussian random vector centered at the origin with covariance matrix M° = Mi — a'^I, 
each of whose eigenvalues is at most l/9dlnn. Let = + g^. If ||ai|| < 1 + for all 
i, then we can apply Corollary 4.3.1 to show 

_ E _ [Shadow^,* (ai, . . . , a„)] < V (d, n, — ^"j < V (d, n, ) . 

9i, -,9„ \ max(l,maxi ||a||)y V l + maxi||a||y 

On the other hand. Corollary 2.4.6 implies 

Pr [3i : ||ai|| > 1 + ||ai||] < 0.0015 
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So, using Lemma 2.3.3 and Shadow^^j (oi, . . . , On) < (^) , we can show 



E 

9i,-,9„ 



E [Shadowz,^ (ai,...,an)] 



<V { d. n. „_„ 1 + 1. 

1 + maxj a 



from which the Corollary follows. 



Corollary 4.3.3 {pi free) Let y G IR"" be a positive vector. Let z and t he unit vectors 
and let ai, . . . ,an be Gaussian random vectors in H'' with covariance matrices Mi, ... , 
centered at points Oi, . . . , a„, respectively. If the eigenvalues of each lie between and 
l/9c/lnn, then 



^[Shadowz,t(ai,...,an);y] <V[d,n,— f-— j-- -— +1 

V (1 + maxi ||ai||)(maxiyj)/(mmjyj)y 

where 'D{d,n,a) is as given in Theorem 4-0.1. 

Proof Nothing in the statement is changed if we rescale the yiS. So, assume without loss 
of generality that miuj j/j = 1. 

Let bi = ai/yi. Then hi is a Gaussian random vector with covariance matrix Mi/yf 
centered at a point of norm at most ||a.j|| < ||ai||. Then, the eigenvalues of each Mi 
lie between cP' jyl and l/(9dlnny|) < l/9dlnn, so we may complete the proof by applying 
Corollary 4.3.2. ■ 
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5 Smoothed Analysis of a Two-Phase Simplex Algorithm 



In this section, we will analyze the smoothed complexity of the two-phase shadow- vertex 
simplex method introduced in Section 3.3. The analysis of the algorithm will use as a black- 
box the bound on the expected sizes of shadows proved in the previous section. However, 
the analysis is not immediate from this bound. 

The most obvious difficulty in applying the shadow bound to the analysis of an algorithm 
is that, in the statement of the shadow bound, the plane onto which the polytope was 
projected to form the shadow was fixed, and unrelated to the data defining the polytope. 
However, in the analysis of the shadow-vertex algorithm, the plane onto which the polytope 
is projected will necessarily depend upon data defining the linear program. This is the 
dominant complication in the analysis of the number of steps taken to solve LP' . 

Another obstacle will stem from the fact that, in the analysis of LP^ , we need to 
consider the expected sizes of shadows of the convex hulls of points of the form af /yf , 
which do not have a Gaussian distribution. In our analysis of LP~^ , we essentially handle 
this complication by demonstrating that in almost every small region the distribution can 
be approximated by some Gaussian distribution. 

The last issue we need to address is that if s^in {Aj) is too small, then the resulting 
values for y[ and yf can be too large. In Section 5.1 we resolve this problem by proving that 
one of Sndlnn randomly chosen / will have reasonable Smin {Ai) with very high probability. 
Having a reasonable Smin {A-i) is also essential for the analysis of LP' . 

As our two-phase shadow- vertex simplex algorithm is randomized, we will measure its 
expected complexity on each input. For an input linear program specified hy A, y and z, 
we let 

C{A,y,z) 

denote the expected number of simplex steps taken by the algorithm on input {A, y, z). As 
this expectation is taken over the choices for X and a, and can be divided into the number 
of steps taken to solve LP+ and LP', we introduce the functions 

<S^(^,y,X,a), 

to denote the number of simplex steps taken by the algorithm in step (5) to solve LP' for 
a given A^ y, I and a, and 

St{A,y,I) + 2 

to denote the number of simplex steps'^ taken by the algorithm in step (7) to solve LP+ 
for a given A, y and I. We note that the complexity of the second phase does not depend 
upon a, however it does depend upon X as J affects the choice of k and M. We have 

CiA,y,z)<B [S'^{A,y,I,a)]+B a)] + 2. 

I,CX I,Oi 

Theorem 5.0.1 (Main) There exists a polynomial V and a constant ctq such that for every 
n> d>3, A = [ai, . . . , o„] e W""^, y e'EJ' and z e IR"^, and a > 0, 

E^[CiA,y,z)]<mm (^r{d,n,l/ mm{a,ao)), (^^ + Q + l) + 2^ , 
^The seemingly odd appearance of +2 in this definition is explained by 3.3.5. 
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where A is a Gaussian random matrix centered at A of standard deviation cr maxj ||(yj, ai)||, 
and y is a Gaussian random vector centered at y of standard deviation amaxi \\[yi,ai)\\. 

Proof We first observe that the behavior of the algorithm is unchanged if one multiphes 
A and y by a power of two. That is, 

C{A,y,z)=C{2''A,2''y,z), 

for any integer k. When A and y are Gaussian random variables centered at A and y of 
standard deviation crmaxj ||(yi,ai)||, 2^ A and 2''y arc Gaussian random variables centered 
at 2^ A and 2''y of standard deviation a max, j|(2''yj, 2^aj)||. Accordingly, we may assume 
without loss of generality in our analysis that maxj \\{yi, ai)\\ G (1/2, 1]. 

The Theorem now follows from Proposition 5.0.2 and Lemmas 5.2.1 and 5.3.1. 

■ 

Before proceeding with the proof of Theorem 5.0.1, we state a trivial upper bound on 
S' and S+: 

Proposition 5.0.2 (trivial shadow bounds) For all A, y, z, X and ol: 

S',{A,y,I,a)< Q and St{A,y,I,a) < (^^ J- 

Proof The bound on S' follows from the fact that there are (^) d-subsets of [n]. The 
bound on follows from the observation in Lemma 3.3.5 that the number of steps taken 
by the second phase is at most 2 plus the number of {d + l)-subsets of [n]. ■ 
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5.1 Many Good Choices 



For a Gaussian random d-hy-d matrix (oi, . . . , a^), it is possible to show that the probabihty 
that the smallest singular value of (ci, . . . , a^) is less than e is at most 0(d^/^e). In this 
section, we consider the probability that almost all of the d-hy-d minors of a d-by-n matrix 
(ci, . . . , a„) have small singular value. If the events for different minors were independent, 
then the proof would be straightforward. However, distinct minors may have significant 
overlap. While we believe stronger concentration results should be obtainable, we have 
only been able to prove: 

Lemma 5.1.1 (Many good choices) For n> d>3, let ai, . . . ,an be Gaussian random 
variables in JR!^ of standard deviation a centered at points of norm at most 1. Let A = 
(ai, . . . , On)- Then, we have 



Pr 

Ol,...,On 



[Smin{Al) < Ko] > ( 1 - - j f 



where 



Kq 



def crmin(l,cr) 
12d^n^\/lnri 



(35) 



In the analyses of LP' and LP^, we use the following consequence of Lemma 5.1.1, 
whose statement is facilitated by the following notation for a set of d-sets, X 

I{A) =^ argmaxjg^: (smin {Aj)) . 



Corollary 5.1.2 (probability of small Smin {Ax(a))) For n > d > 3, let ai, . . . ,an be 

Gaussian random variables in M*^ of standard deviation a centered at points of norm at 
most 1, and let A = (oi, . . . , a„). Fori a set of Sndlnn randomly chosen d-subsets of [n\, 

Pr [s^i„ (^x(A)) < Ko] < 0.417 
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Proof 



Pr [Smin < Kq] 

= Pr[V/GX:s„i„(A/)<ACo] 



< Pr 

~ A 



+ Pr 



el : Smin (^7) < Ko 



X [Smin (^/) < «o] > (1 - - 



n I \d 



1 \ m 

-2.9d+i + / 1 _ _ by Lemma 5.1.1 



n 



-1 



-2.9(i+l 



+ n as |X| = Sradlnn, 



< 0.417 



for ra > d > 3. 



We also use the following corollary, which states that it is highly unlikely that k falls 
outside the set /C, which we now define: 



/C = |2Lig(^)J :Ko<x<Vd + SdV&a} 



(36) 



Corollary 5.1.3 (probability of K in /C) For w > c? > 3, let Oi, . . . , Oj^ be Gaussian 
random variables in Mf^ of standard deviation a centered at points of norm at most 1, and 
let A = (oi, . . . , On)- For I a set of Sndlnn randomly chosen d-subsets of [n\, 



Pr 

AX 



< 0.42 



n 



Proof 

It follows from Corollary 5.1.2 that 



Pr [s^in (^x(A)) < Ko] < 0.417 



n 



On the other hand, as 



Smin (^/) < < \/dmax ||Oj|| , 



Smin {M{A)) >\fd{\^ 3Vdln na^ 
by Corollary 2.4.6. ■ 



Pr 

AX 



< Pr 
~ A 



max |n*j|| ^ 
i 



aAl > l + sVdhi. 



na 



n 



< 0.0015 
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Proposition 5.1.4 (size of K) 



|/C| < 91g(n(i/min(cr,l)). 

The rest of this section is devoted to the proof of Lemma 5.1.1. The key to the proof is 
an examination of the relation between the events which we now define. 

Definition 5.1.5 For I G ('^^)) K G (J"\)) (^f^d, j K, we define the indicator random 
variables 

X] = [smin (Ai) < kq] , and 

= [dist (ttj, Span{AK)) < ho] , 



where 



, def cr 



In Lemma 5.1.8, we obtain a concentration result on the Y^s using the fact that the 
are independent for fixed K and different j. To relate this concentration result to the Xjs, 
we show in Lemma 5.1.9 that when Xj is true, it is probably the case that is true 

for most j. 

Proof of Lemma 5.1.1 The proof has two parts. The first, and easier, part is Lemma 5.1.8 
which implies 



Pr 

ai,...,a„ 



E E4s 



n — d — 1 



n 
d-1 



> 1-n 



-n+d-l 



To apply this fact, we use Lemma 5.1.9, which implies 



Pr 

ai,...,a„ 



E E4>iE^' 

Combining these two Lemmas, we obtain 



> 1-n -n 



d -2.M+1 



Pr 

a\,...,a„ 



< 



n — d — 1 



n 
d-\ 



> 1 — n — n" 



-n+d-l _ ^-2.9d+\ 



Observing, 
d 



< 



n — d — 1 



n 
d-1 



I 



n — d f n 



d \d-l 
n — d (n 



n — d-\-\\d 

1 \ i n 



n — d + 1 J \d 
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we obtain 



Pr 

ai,...,a„ 



n 

n J \d 



Lemma 5.1.6 (Probability of Y^) Under the conditions of Lemma 5.1.1, for all K G 

(Ji) andj^K, 



Pr 

oi,...,a„ 



< ^ 

~ a 



Proof Follows from Proposition 2.4.7. 



Lemma 5.1.7 (Sum over j of Y^) Under the conditions of Lemma 5.1.1, for all K G 
( ["] ^ 

. ^ „ , \{n-d+l)/2\ 

Y^Y=^>\{n-d+l)/2\ 



Pr 

ai,...,o„ 



Proof Using the fact that for fixed K, the events are independent, we compute 



Pr 

ai,...,an 



J2Y^>\in-d+l)/2] 



< 



J ( [n]-K 



Pr 

ai,...,a„ 



yl 



< 



E 

[n]- 
|-(„_d+l)/21 

E n 

J-J- ai,...,o 

J / ln]-K \jeJ 

/hn\ r{n-d+i)/2i 



< 



r{n-d+l)/21y 





4, ,r(n-d+l)/2l 



£7 



by Lemma 5.1.6, 



as 



( [n]-K \ 

y\{n-d+l)/2-]) 



Lemma 5.1.8 (Sum over K and j of Y^) Under the conditions of Lemma 5.1.1, 



Pr 

ai,...,a„ 



E E^i> 



n — d — 1 



d- 1 
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Proof If Z]xe( ) ^i^-R"^^ ^ [" " ^ '*' ] (d-i)' ^^^"^ there must exist a K for which 
T^j^K > l^^^P^] ' which implies for that K 



n — d — 1 



+ 1 



n-d+i 



Using this trick, we compute 

n — d — 1 



Pr 

Oi,...,a„ 



(by Lemma 5.1.7) 



n 
d-1 



< Pr 

Ol,...,0„ 



n 



3K G 



> 



n- d+l' 



,a — 1 / oi,...,o 



< 



n 
d-1 



r(„_d+i)/2i 



n - d+ 1" 







Vn-d4 

















4/ir 



r(n-d+l)/2] 



r(n-d+l)/2] 



The other statement needed for the proof of Lemma 5.1.1 is: 
Lemma 5.1.9 (Relating Xs to l"s) Under the conditions of Lemma 5.1.1, 



Pr 

Ol,. ..,an 



Proof Follows immediately from Lemmas 5.1.10 and 5.1.12. 



Lemma 5.1.10 (Geometric condition for bad /) If there exists a d-set I such that 

Xi and Y.^L{j}^d/2, 

then there exists a set L C I, \L\ = [d/2 — Ij and a jo £ I — L such that 

max,- lla,- 



dist (Cjo, Span{AL)) < VdKo ( 1 + 
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Proof Let / = By Proposition 2.2.6 (a), Xi implies the existence of 

Uh,...,Ui^, \\{ui-,,...,UiJ\\ = 1, such that 



iei 



< Kq. 



On the other hand, X^je/^z-jy} — ^/"^ impUcs the existence of a J C /, \J\ = \d/2'], such 

that Yj_^jy = for ah j G J. By Lemma 5.1.11, this imphes \uj\ < kq/^o for all j € J. As 

\\{uij^, . . . ,Ui^)\\ = 1 and ko/^o < 1/v^, there exists some jo £ /— J such that {ujqI > 1/y/d. 
Setting L = I — J — {jo}, we compute 



< 



'JO 







< KO + 

















< (Vkjol) ^0 + 










3&J 



< Vd I Kq + \Uj\ \\aj\ 



dist (fljo, Span (A/,)) < \/d ( kq + 





'd' 


^KO + 


2 


■■■,0'd 


6e 



ACQ maxj II Oj 

^0 



unit vector such that 



1=1 



//dist (^aj, Span(j^ai}^^j^^ > Hq, then \uj\ < ko/Hq. 



Proof We have 



i=l 



aj + '^{ui/uj)ai < Ko/ \uj 
dist (^Oj,Span ({oilj^^)) < ko/ | 



from which the lemma follows. 
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Lemma 5.1.12 (Probability of bad geometry) Under the conditions of Lemma 5.1.1, 



Pr 

Ol,...,On 



3L G (Lrf/2lij)'Jo L such that 

dist (a,o, 5pan(^L)) < Vd/^o (l + \f\ ^^^iiM) 



Proof We first note that 



Pr 

ai,...,On 



< Pr 
oi,...,a„ 



dist (a,„, Span {Al)) < Vdno (l + H] ^ 



dist {aj„ Span (A^)) < v^acq (l + \l] ^+'^" ) 



+ Pr 

Ol,...,0„ 



max 



laJI > 1 + SV din 



na 



(37) 
(38) 



We now apply Proposition 2.4.7 to bound (37) by 



ai,...,o„ 



dist (Ojo, Span {Al)) < Vd^o 1 + 



na 

hn 



1 + 3\/ d In na 
ho 



d-\L\ 



(39) 



To simplify this expression, we note that < for d > 3. We then recall 

Ko min((T, 1) 



^0 Sd^n^Vlnn' 



and apply d > 3 to show 
s/dno 





'd' 






2 



1 + 3W Inner \ VdKo / 2^3/2 ^ 

; < h 1— -^^ h 2d V Inn 



< 



n 



3" 



So, we have 



'=""^l[<i/2"-ljl("-'^''2+''" 



n" 



rd/21 



< ^L'i/2-lJ+lyj-3d/2 



< n 



(40) 

(41) 
(42) 



On the other hand, we can use Corollary 2.4.6, to bound (38) by n 2.9d+i 
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5.1.1 Discussion 



It is natural to ask whether one could avoid the complication of this section by setting 
I = {1, . . . ,d}, or even choosing / to be the best d-set in {1, . . . ,d + k} for some constant k. 
It is possible to show that the probability that all d-hy-d minors of a perturbed d-hy-{d + k) 
matrix have condition number at most e grows like {\fde/a)^ . Thus, the best of these sets 
would have reasonable condition number with polynomially high probability. This bound 
would be sufficient to handle our concerns about the magnitude of y[. The analysis in 
Lemma 5.2.4 might still be possible in this situation; however, it would require considering 
multiple possible splittings of the perturbation (for multiple values of ri), and it is not clear 
whether such an analysis can be made rigorous. Finally, it seems difficult in this situation 
to apply the trick in the proofs of Lemma 5.3.1 and 5.2.1 of summing over all likely values 
for K. If the algorithm is given a as input, then it is possible to avoid the need for this trick 
(and an such an analysis appeared in an earlier draft of this paper). However, we believe 
that it is preferable for the algorithm to make sense without taking a as an input. 

While choosing I in such a simple fashion could possibly simplify this section, albeit at 
the cost of complicating others, we feel that once Lemma 5.1.1 has been improved and the 
correct concentration bound has been obtained, this technique will provide the best bounds. 

One of the anonymous referees pointed out that it should be possible to use the rank 
revealing QR factorization to find an / with almost maximal Smin 

{Ai) (see [CH92]). While 

doing so seems to be the best choice algorithmically, it is not clear to us how we could 
analyze the smoothed complexity of the resulting two-phase algorithm. The difficulty is 
that the assumption that a particular / was output by the rank revealing QR factorization 
would impose conditions on A that we are currently not able to analyze. 
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5.2 Bounding the shadow of LP' 



Before beginning our analysis of the shadow of LP', we define the set from which a is 
chosen to be ^1/^2, where we define 

A = {ex : (q;|1) = 1} , and 

= {oL : {ol\1) = 1 and > 5,Vi} . 



The principal obstacle to proving the bound for LP' is that Theorem 4.0.1 requires one 
to specify the plane on which the shadow of the perturbed polytope will be measured be- 
fore the perturbation is known, whereas the shadow relevant to the analysis of LP' depends 
on the perturbation — it is the shadow onto Span (^la, 2;). To overcome this obstacle, we 
prove in Lemma 5.2.4 that if Smin {■^i{A)) ^ '^0/2, then the expected size of the shadow 
onto Span (Aa, z) is close to the expected size of the shadow onto Span (-Aa, z), where 
a is chosen from ^o- As this plane is independent of the perturbation, we can apply The- 
orem 4.0.1 to bound the size of the shadow on this plane. Unfortunately, A is arbitrary, so 
we cannot make any assumptions about Smin (-^x(A))- Instead, we decompose the pertur- 
bation into two parts, as in Corollary 4.3.2, and can then use Corollary 5.1.2 to show that 
with high probability Smin {^i{A)^ ^ '«o/2. We begin the proof with this decomposition, 
and build to the point at which we can apply Lemma 5.2.4. 

A secondary obstacle in the analysis is that k and M arc correlated with A and y. We 
overcome this obstacle by considering the sum of the expected sizes of the shadows when k 
and M are fixed to each of their likely values. This analysis is facilitated by the notation 

T!^{A, I, a, K, M) =^ |ShadowA^„,z (ai, y') I > w^^^^ y- = \ ^3 /. ^ ^• 

I VaM^/4«; otherwise. 

We note that 



Lemma 5.2.1 (LP') Let d > 3 and n > d + 1. Let A = [ai, . . . , a„] G IR"'''^, y G R" and 
z € IR'^ satisfy maxj \\{yi, ai)\\ € (1/2, 1]. For any a > 0, let A be a Gaussian random matrix 
centered at A of standard deviation a, and let y by a Gaussian random, vector centered at 
y of standard deviation a. Let a be chosen uniformly at random from Ai/^2 and let I be a 
collection ofSndlnn randomly chosen d-subsets o/[n]. Then, 

E [5;(^,y,X,a)] = 326nd(ln n) \g{dn/ min(l, a)) V (d, n, "j^"*^^' , ^ ) , 

where V^d^n^a) is as given in Theorem 4-0.1. 

Proof Instead of treating A as a perturbation of standard deviation a of A, we will 
view A as the result of applying a perturbation of standard deviation tq followed by a 
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perturbation of standard deviation ri , where Tq + rf = a^. Formally, we will let G be a 
Gaussian random matrix of standard deviation tq centered at the origin, A = A + G, G he 
a Gaussian random matrix of standard deviation ri centered at the origin, and A = A + G, 
where 



def 



nn 

and Tq = cj^ — r^. Wc similarly decompose the perturbation to y into a perturbation of 
standard deviation tq from which we obtain y, and a perturbation of standard deviation ri 
from which we obtain y. We will let h = y — y. 
We can then apply Lemma 5.2.2 to show 



Pr 

I,A,G 



< 0.42 



(43) 



One difficulty in bounding the expectation of T' is that its input parameters are corre- 
lated. To resolve this difficulty, we will bound the expectation of T' by the sum over the 
expectations obtained by substituting each of the likely choices for k and M. 

In particular, we set 

M = |2rig^l+2 : ^max \\{yi, ai)\\^ - SVdlnnn <x< (max \\{yi, a^H^ + SVdlnnrij . 

We now define indicator random variables V, W, X, Y, and Z by 
V = [\M\<2], 



W 

X 
Y 
Z 



max \\{yi, aj)|| < 1 + 3-^ {d + 1) Inner 
i 

Smin (-^X(A)) > /^o/S , 
2LlgSmin(Ax(^))J g ^1 ^ 

2rigma!Ci||(j/i,Oi)|n+2 ^ 



and then expand 



E [S\A,y,I,a)] 

= E [S'{A,y,I,a)VWXYZ] + E [S'{A,y,I,a){l -VWXYZ) 

T,A,y,a I,Ay,oc 

Prom Corollary 5.1.3, we know 



(44) 



Pr [not(y)] = Pr [2LigSmin(^iu))J ^ jc 



A, I 



< 0.42 



n 



Similarly, Corollary 2.4.6 implies for any A and y and n > d> 3. 



_Pr [not(Z)] 



Pr 



2rigmaxj||(2/j,Oi)|n+2 ^ 



< 0.0015 



(45) 



(46) 
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Prom Corollary 2.4.6 we have 

Pr [not(l^)] < < 0.0015 



For iQ an index for which ||(yio5 aio)ll ^ 1/2, Proposition 2.4.9 implies 



Pr [not(y)] < .Pr \\{yi„ dij|| < 9V(c? + 1) Innn 



< 0.01 



By also applying inequality (43) to bound the probability of not(X), we find 



n 



As 



Pr [(1 - VWXYZ) = 1] < 0.86, ^ 

A,y,X \d 



S'{A,y,I,a) < Q, (by Proposition 5.0.2) 



the second term of (44) can be bounded by 1. 
To bound the first term of (44), we note 



E [S'{A,y,I,a)VWXYZ] 

T,A,y,a 



< E 



VW B [T' iA,I{A),a,K,M)XW] 



Moreover, 

E [T' {A,I{A),ot,K,M)XW] 



G,h,a 

= -E 

G,h,a 

< E 

G,h,OL 

< E 



Y W [s^in (1/) > Ko/2 



Vr' (4,7,q:,«;,M) 1^ and Smin f > ko/2 
^_E W' {A,I,(x,K,M) and Smin (^7) > «o/2 



lei 



< y (6 + 10-^)P f d, n, = 



ri 



^ (2 

< 3(6 + 10-*^)nd(lnn) {V(d,n, 



(47) 



ncr 

4ri«; 

){VdM) 



by Lemma 5.2.3, 
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Thus, 



(47) < E 



< E 

< E 



VW + 10~^)"'^(ln n)^) (d, n, 

3(6 + 10-^)nd(lnn)(y \M\) |/C| WV ^d, n, 
6(6 + 10-^)nd(ln n) |/C| W^P n, 

< 6(6 + 10-^)nci(ln n) |/C| P | d, n, 



4riK 



(2 + 3Vcilnnc7)(^/dM) 

4ti min(/C) 
(2 + 3Vdlnna)(Wmax(A1)) 
4ri min(/C) 



(2 + 3\/d In na) {Vd max(A^) ) 
2tiKo 



\/d(2 + 3Vdlnna){l + 6v^(d + 1) Inna) 



where the last inequahty foUows from max(A^) < 1 + 6-^/(d"+T)liiria" when is true. 
To simphfy, we first bound the third argument of the function V by: 



Vd{2 + 3Vdlnncr)(l + 6^y {d + 1) In na) 

1 k2 



3d3\/hm \/d(2 + 3\/dlnno-)(l + 6^/(0? + 1) Inner) 

2 ^2/_- n ^^^2 



1 



1 



cj (min(l, (t))^ 



> 



3d3-5Vh^ Vl2d2n7\/hmy (2 + 3Vdln na)il + 6v^(d + 1) Inner) 
1 minfl,(j^) 



> 



432d7-5ni4(lnn)i-5 (2 + 3v'dh^)(l + 6^y{d+ l)\nn) 

1 min(l,(T^) 
432d7-5ni4(lnn)i-5 30dlnn 
_ min(l, (7^) 
~ 12, 960d8-5ni4 In^-^ n 

where the last inequality follows from the assumption that n > d > 3. 

Applying Proposition 5.1.4 to show |/C| < 91g(dn/ min(l,(7)), we now obtain 



(47) < 6(6 + 10"^) |/C| nd(lnn)P d, n, 



min(l, (7^) 



< 325nd{lnn) \g{dn/ min(l,c7))I? I d,n 



12, 960d8-5ni4 In2-5 n 
min(l, cr^) 



12,960d8-5ni4ln2-^ny ' 



Lemma 5.2.2 (probability of small Smin ^-ix(jl)^) -^o'^ ^> and I as defined in the 
proof of Lemma 5.2.1, 

-1 



Pr 

I,A,G 



{Ma)) < 1^0/2 



< 0.42 
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Proof Let I = T{A)^ we have 



Pr 



Smin (^7) < 



< 



Pr 



Smin {Ai) < Ko|Smin [^I^ < '^o/'^ 



Prom Corollary 5.1.2, we have 

Pr[smi„(^/) <Ko] < 0.417 (^^^ , 

On the other hand, we have 

Pr Smin {A-i) > Ko|Smin (-^j) < Ko/2 

Smin (^/) > and Smin (^z) < ^^o/^ 



< Pr 

< Pr 



At- A, 



> Ko/2 



< Pr 

~ A 

= Pr 
A 



max II Cj — flill > /«o/2\/d 



by Proposition 2.2.6 (b), 
by Proposition 2.2.4 (d). 



max llflj — fli II > 

i 

2.9d+l 



< n 

by Corollary 2.4.6. Thus 



(5.2) < 



0.417Q)-^ 



< n 42 

1 _ „-2.M+l - V ^ 



n 



-1 



ioT n > d > 3. 



Lemma 5.2.3 (From a) Let I be a set in (j-^) and let ai, . . . , a„ be points each of norm 
at most 1 + 3V(d+l)ln na such that 



Then, 



E [Shadow A^a ^ (d, . 



• ,an;y')] ^ (6+io-^)P U, 



n 



(2 + S-v/dhmcj) (maxj y'-/ min^ y^) 

(48) 
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Proof We apply Lemma 5.2.4 to show 



E [\ShadoMVAioc,z{ai,---,an;y')\] <6 ^ Shadow^^^ ^ (oi, . . . , Cn; y') 



< 6 max E 
aeAo A L 



< 62? d,n, 



Shadow^^^^ (oi, . . . , 0„; y') 

n 



(2 + SVdln na){m.siXi y'J minj y- 



+ 1 
+ 1 



< (6 + 10-^)P d,n 



Tl 



(2 + 3Vd\n na)(maxi j/^/ miiij y-) 
by Corollary 4.3.3 and fact that 'D{n, d, a) > 58, 888, 678 for any positive n, d, a. ■ 

Lemma 5.2.4 (Changing a to a) Let I G ^^}) ■ Let Oi,...,a„ he Gaussian random 
vectors in H*^ of standard deviation ti, centered at points di, . . . , d„. If s^in (■^^) — '^o/2, 



^ [|Shadow^^a,z (oi, . . . , a„; y') |] < 6 Shadow^^^ ^ (oi, . . . , a„; y') 



+1. 



Proof The key to our proof is Lemma 5.2.5. To ready ourselves for the application of 
this lemma, we let 

TA{t) = |ShadoWf,z (ai, . . . , a„; y') \ , 



and note that J^A{t) = TA{t/ \\t\\). If A- A < 3d\/lnnTi, then 

2 



< 



A-' 



I- A A 

By Proposition 2.2.6 (b), 



A- A 



< I — I 3d^A^ 



— f \ 3(i\/lnnKo 1 

""^^ - v^y 12^3 VhT^ ~ 2^2" 



^ - A 



> Ko/2 — 3(i\/ln nri 



- 2 V 2d2 

- 2 I18; ' 



for d>3. So, we can similarly bound 

I-A-^A 

We can then apply Lemma 5.2.5 to show 



< 



17^2- 



cxeA 



E [| Shadow (ai, . . . , a„; t/') |] < 6 _E^ Shadow^^^^^^ (ai, . . . , a^; y') 



l/d2 
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Prom Corollary 2.4.6 and Proposition 2.2.4 (d), we know that the probability that A — A > 

SdVlnnri is at most ^-2-9^+1. As Shadow^^^ ^ (ffli, . . . , fflni 2/') ^ G)' can apply 
Lemma 2.3.3 to show 



E 

A 



E [|Shadow^,a,z (oi,---,On;y') 



< 6E 

~ A 



E 



Shadowj^^ - ^ (oi, . . . , an; y') 



+1. 



To compare the expected sizes of the shadows, we will show that the distribution 
Span(Aa,z) is close to the distribution Span^^Q:,z^. To this end, we note that for 

a given a. ^ Aq the a G A for which Aa. is a positive multiple of Aa is given by 



def A ^AOL 



A-^Aall 



(49) 



To derive this equation, note that A6t is the point in A (ai, . . . , a^) specified by a. A~^A6t 
provides the coordinates of this point in the basis A. Dividing by (^A~^ Adt\l^ provides 
the ex. e A specifying the parallel point in AfF (oi, . . . , aa). We can similarly derive 



A Aa 

A ^Aall 



Our analysis will follow from a bound on the Jacobian of ^. 

Lemma 5.2.5 (Approximation of ot by a) Let T{x) he a non-negative function de- 
pending only on x/ \\x\\. If S = 
e < 9/17d^ then 

E [F{Aoc)] <6 E \j^{Aa) 



I-A ^A 


< e, and 


I-A-^A 


< e, where 











Proof Expressing the expectations as integrals, the lemma is equivalent to 

6 

:/-{Aa)da < — 



[ J^{Aa)da < ^ , / J^{A&)d&. 

H^5)Jc.eAs Vol(Ao)7aeAo 



Vol (A;^) .L^ 4 / ' - Vol (Ao) yaeAo ' 
Applying Lemma 5.2.7 and setting a = ^'(a), we bound 



Vol I 



^— / JP-(Aa) da < ^ , / 



1 



J^{Aa) da 



Vol (As) J^^Ao 



da. 



da 



OL] 



d-^{a) 



da 



da 
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(as AoL is a positive multiple of A'^{6l) and T{x) only depends on cc/ 



< max 



max 



d-^{a.) 



da. 
d-^{dc) 



1 



< 



da. 



Vol {As) joc&Ao 
Vol (Aq) 
Vol {As) 



J^{Aa.) da 
J Vol (^o) JaGAo 



(1 - eVd)d{l -e)\l-dSj Vol(Ao) JaeAo 
(by Proposition 5.2.6 and Lemma 5.2.10) 

Vol(Ao)y«eAo 
for e < 9/17^2, S = 1/d^ and d > 3. ■ 

Proposition 5.2.6 (Volume dilation) 

Vol(Ao)_ 



J^{Aa) da 



Vol (^5) \l-dS 

Proof The set As may be obtained by contracting the set at the point {l/d,l/d,. . 
by the factor (1 — d6). ■ 

Lemma 5.2.7 (Proper subset) Under the conditions of Lemma 5.2.5, 

As C *(Ao). 

Proof We will prove 



.,1/d) 



^-\As) c ^0- 



- -1 



Let a G As, a' = A Aa and a = a'/ (a'|l). Using Proposition 2.2.2 to show ||q;|| < 
||q;||^ = 1 and Proposition 2.2.4 (a), we bound 



I-A^A 



\a\\ > 6-e > 0. 



So, all components of a' are positive and therefore all components of a = a'/ {a'\l) are 
positive. ■ 

We will now begin a study of the Jacobian of This study will be simplified by 
decomposing ^ into the composition of two maps. The second of these maps is given by: 

Definition 5.2.8 (r„^„) Let u and v be vectors in M*^ and let Tu^y{x) be the map from 
{x : {x\u) = 1} to {x : {x\v) = 1} by 



r«,u(>^) — 



X 



{x\v) 
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Fi gure 5: ^u,v can be understood as the projection through the origin froin one plane onto 
the other. 



Lemma 5.2.9 (Jacobian of 



doc 



det A 'A 



111 




(^A-^Aa\iy 




T 

1 



Proof Let ot = ^'(q;) and let a' = A ^Aa. As {a\l) = 1, we have 

a'ld^^AYl) = I. 



So, a = r„„(Q;'), where u = (A A) 1 and v = 1. By Lemma 5.2.11, 



T 



da 



da. 



da 



da' 



da' 



da 



da 

det (^A-^A 



A-^Aa\l 



a-'aYi 



Lemma 5.2.10 (Bound on Jacobian of ^) Under the conditions of Lemma 5.2.5, 



d^{a) 



da 



< 



{l + eY 



{1 - eVd)'^{l - e)' 



for all a ^ Aq. 
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Proof The condition 
implies 



det 



< e implies 



< 1 + e, so Proposition 2.2.4 (e) 



(a-^AJ < (l + e)''. 



Observing that II 1 II = and I - {A ^ A)'^ = I - (A A 



- -1 



, we compute 



{A'^Afl 



> mil - 



l-{A~'Afl 



> Vd- 



I -{A Af 



|1|| > Vd-eVd. 



So, 



< 



1-e 



Finally, as = 1 and ||a|| < 1, we have 

'A-^Aa\l^ = (all) + (^A'^Aa - ajl 
= 1 + ({A-^A - I)a\l 



> 1 - 



A~^A-I 



\a\\ ||1| 



> 1 - eVd. 



Applying Lemma 5.2.9, we have 



da. 



det 



A-'Aoc\l 



T 



< 



(1 - eVdY{l - e) 



Lemma 5.2.11 (Jacobian of r„^„) 



det 



u,v (-^^ 
dx 



{x\v) \\u\ 



Proof Consider dividing M'^ into Span {u, v) and the space orthogonal to Span (it, v). 
In the {d — 2)-dimensional orthogonal space, T^^t, acts as a multiplication by 1/ {x\v). 
On the other hand, the Jacobian of the restriction of r„^„ to Span (m, v) is computed by 
Lemma 5.2.12 to be 

lli'll 



So, 



det 



dx 



{x\v) \\u\ 



d-2 



{x\v)j {x\vy 



\U\\ 



{x\v) \\u\ 
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Lemma 5.2.12 (Jacobian of F^^^ in 2D) Let u and v be vectors in and let Tu,v{x) 

be the map from {x : {x\u) = 1} to {x : {x\v) = 1} by 



Then, 



Proof Let R 



det 



Tiijij (33) 



dx 



X 



{x\v) ' 



{x\v) \\u[ 



-1 

1 



, the 90° rotation counter-clockwise. Let 



u =Rul\\u\\ and v = Rv I \\v\ 



Express the x such that {x\u) = 1, as x = u/ \\u\\^ + xu^ . Similarly, parameterize the line 

2 

r„,„ {u/\\uf + xu^^ = v/\\vf+yv^, 



{x : {x\v) = 1} by v/ \\v\\ +yv^. Then, we have 

i u,v \^U'/ ir"i'^ 

where 



11/ \\u\\'^^ + .ru^lv 



u/ \\u\\ + xu-^\v 



{x\v) 



So, 



det 



dx 



det 



dy_ 

dx 



u 



'-r^ + XU 



V ) — { u 



•-^ + XU 



{x\vy 



u 



V ) — (U 



\v\\ (^(^u^ 



{x\vY 



u- 



V \ I u 

11^11/ 



tt (x\v) 



|w|| {x\vY 



, as il is orthogonal and R^ 



u 



\u\\ {x\vf'' \\u 



as J — 77, u is a basis. 
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5.3 Bounding the shadow of 



The main obstacle to proving a bound on the size of the shadow of LP+ is that the vectors 
a'l /y'l are not Gaussian random vectors. To resolve this problem, we will show that, in 
almost every sufficiently small region, we can construct a family of Gaussian random vectors 
with distributions similar to the vectors af /yf . Wc will then bound the expected size of 
the shadow of the vectors o,f /yf by a small multiple of the expected size of the shadow 
of these Gaussian vectors. These regions are defined by splitting the original perturbation 
into two, and letting the first perturbation define the region. 

As in the analysis of LP' , a secondary obstacle is the correlation of k and M with A 
and y. We again overcome this obstacle by considering the sum of the expected sizes of the 
shadows when k, and M are fixed to each of their likely values, and use the notation 

n'+l A „ ^ Ai\ 45f / |Shadow(o,2),^+ {at /yt a+/y+) | , if VdM/An > 1 
I U otherwise. 



where 



«^+= ((yi- 2/0/2, fli), 

yt = iy'i + yi)/2, and 

l\/dM^/4«; otherwise. 



y 



By Lemma 3.3.5 and Proposition 3.3.2, we then have 



Lemma 5.3.1 (LP+) Let d > 3 and n > d + 1. Let A = [di, ... , a„] G U"-""^, y e W 
and z G IR'^, satisfy max, \\{yi, a,i)\\ G (1/2, 1]. For any a > 0, let A be a Gaussian random 

matrix centered at A of standard deviation a, and let y by a Gaussian random vector 
centered at y of standard deviation a. Let I be a set ofSndlnn randomly chosen d-subsets 
of [n] . Then, 

(minfl cj^) \ 
^'"' 2^3(,+ l)lW4(ln,)5/2 j+^- 

where V{d,n,a) is as given in Theorem 4-0.1. 

Proof For po and pi defined below, we let G and G be Gaussian random matrices 
centered at the origin of standard deviations po and pi , respectively. We then let A = A+ G 

and A = A + G. We similarly let h and h be Gaussian random vectors centered at the 
origin of standard deviations po and pi, respectively, and let y = y' + h and y = y + h. If 

^ 3^174 

~ V2^n{mn{d + 1)3/2 (In n)3/2) ' 
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we set pi = a. Otherwise, we set pi so that 

3Vl/4 + d(^2„^2) 



pi 



2en(60n((i+ 1)3/2 (In n)3/2)- 



and set Pq = a'^ — pf. We note that 



pi = mm a, 



3^1/4 + dp2 



' v^n(60n(d + 1)3/2 (In n)3/2) y ' 
As in the proof of Lemma 5.2.1, we define the set of hkely values for M: 



>f = |2rig-H2: (^mpc||(y,,a,)||) (^1 - 



9v^(d + l)ln 



n 



(60n(d + 1)3/2 (In n)3/2) 



< X 



( „/~ ~ mA i 9A/(d+ l)lnn \ 

< ^max||(y„a.)||J (^1 + (60„(a + l)3/2(inn)3/2) J 



Observed that |A^| < 2. 

As in the proof of Lemma 5.2.1, we define random variables: 



W 



X 



max \\{yi, di)\\ < 1 + 3^/ (d + l)ln npo 

i 

max||(^„a.)||>^^^^ 



2en 
, and 



Y= 2l-'^^'"'"(^2:(4))J G /C 

In order to apply the shadow bound proved below in Lemma 5.3.2, we need 

M>3max||(yi,ai)||, 

t 

and 

M > (60n(ci + 1)^/2 (In n)^/2)/9i. 

Prom the definition of M and the inequality 1 - 9i/(d + 1) In n/{60n{d + 1)3/2 (In n)^/^) > 
3/4, the first of these inequalities holds if Z is true. Given that Z is true, the second 
inequality holds if X is also true. 
Prom Corollary 5.1.3, we know 



Pr [not(y)] < Pr \2\}^^'^"-i^^(^))\ ^ K 



A,I ' " A,I 
From Corollary 2.4.6 we have 



< 0.42 



Pr [not(P^)] < n-2-'^('^+^)+^ < 0.0015 



< 0.42n 



G+i) • 



n 
d+1 



(50) 



(51) 
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Prom Proposition 2.4.9, we know 



Pr [not(X)] = Pr 



max||(yj, ai)\\ < 



+ dpl 



2en 



1 / n 



2A\d+l 



To bound the probability that Z fails, we note that 



max||(yi,ai)|| > 



2en 



and 



max||(yi - yi,ai - ai)\\ < /9i3\/ (d+ l)lnn, 
imply Z is true. Hence, by Corollary 2.4.6 and (52), 



Pr [not(Z)] < n-2-9{<^+i)+i + n-('^+^) < .044 



As in the proof of Lemma 5.2.1, we now expand 



n 
d+1 



E [S+{A,y,I)]= B [S+{A,y,I)WXYZ]+ E [S+{A,y,I){l -WXYZ)] 

I,A,y I,A,y I,A,y 

To bound the second term by n, we apply (51) , (52) , (50) and (53) to show 



Pr [not(M^) or not(X) or not(y) or not(Z)] < n 
A,I 

and then combine this inequality with Proposition 5.0.2. 
To bound the first term of (54), we note 

E [S+{A,y,I)WXYZ] 



n 
d + 1 



< E 



< E 



< E 



< E 
^A,y 

< E 

i,^,y 



WX E_[T+ {A,y,K,M)XZ] 



KeK,MeM 



G,h 



WX J2 KW+{A,y,K,M) 



Keic,MeM 



G,h 



xz 



WX eV[d,n, 
KeK,MeM 



pi rami Vj 
3(maxi y'^^ 

aM 



WX V eV (d,n,-—p^j—^] 
^ V 3(M2/4k;)2 } 



+ 1, 



+ 1 



by Lemma 5.3.2 



>.X|.||M|..|,„,|^,.l 



80 



As min(/C) > ko/2 and W implies max(A4) < 9 (l + 3Y^(<i + 1) Inna^ . 
16crmin(/C)2 ^ IGa^ min(l, o-)^ 



+ 1) In na 
16min(l,cr^) 



> 



> 



3-4(^9(1 + 3^(^+1) Inn) (l2d'^n^V]n^y 
min(l, (T^) 



223(d+l)iV2ni4(lnn)V2- 
Applying this inequality, Proposition 5.1.4, and the fact that X imphes \M\ < 2, we obtain 

(56) <491g(nd/min(.,l))P [d^-^ ^ ■ 



Lemma 5.3.2 (LP+ Shadow, part 2) Let d > 3 and n > d+1. Let y be a Gaussian 
random vector of standard deviation pi centered at a point y, and let ai, . . . ,an be Gaussian 
random vectors in M*^ of standard deviation p\ centered at hi, . . . ,an respectively. Under 
the conditions 

y'i > 3(||yi, ai\\),\/i, and (57) 
y[ > 60n(d + l)^/^ (In nf/^pi , Vz. (58) 



Let 



f^t = {{y'i - ^0/2, CLi) , and 

yt = {yr + yd/2. 



Then, 



Pi minj y • 



[|S-dow,,,,„. (at/yt. .... aM) |] < n, ^L-^ ) + 1. 

Proof We use the notation 

( y'i-m-hi 2{ai + Qi) \ 



.y'r + m + hi y'i + m + hi I ' 



where ffi, . • • , g„ the columns of G and (^i, . . . , /i„) = as defined in the proof of 
Lemma 5.3.1. 

The Gaussian random vectors that we will use to approximate these will come from 
their first-order approximations: 



{Pi,o{hi),P{hi,gi)) = 



y[-yi-h,{2y'J{y[ + yi)) 2a^ + 2g, - ^,(2a,/(y^ + y^) 

y'i + yi ' y'i + m 
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Let £'i{pifi,Pi) be the induced density on {pifi,Pi)- In Lemma 5.3.4, we prove that there 
exists a set B of ((pi,o,Pi), ■ ■ ■ , {Pn,o,Pn)) such that 

^„ Pr [{{p,,o,Pi),...,{Pnfi,Pn))&B]>l-Omi5( l ) , 

and for ((pi,o,Pi), • • • , iPn,o,Pn)) ^ 

n n 

YlMPi,o,Pi) < eJjMi(Pi,o,Pj)- 

i=l i=l 

Consequently, Lemma 2.3.4 allows us to prove 

rrn ^ J|ShadoW(o,2),2+((pi,o,Pi),...,(Pn,o,P„))|] 

<e E [|Shadow(o,^),2+ ((pi,o,Pi),...,(Pn,o,Pn))|] + 1- 

lli=l MPi,0,Pi) 

By Lemma 5.3.3, the densities z>i represent Gaussian distributions centered at points of 
norm at most 

'y'i-yi 2aj 



y'i + Vi ' y'i + iji 



< V5, (by condition (57)) 



whose covariance matrices have eigenvalues at most 

(9pi/2yQ^ < (9/2{60n{d + lf/^{lnnf/^)y < l/9dlnn, (by condition (58)) 
and at least 

(9pi/8yO'. 

Thus, we can apply Corollary 4.3.3 to bound 

rrn J|ShadoW(o,;,),;,+ ((pi,0,Pl),...,(p„,0,Pn))|] 

<ev(d, n, W8max,y^ \ ^ ^ 

\ (1 + V5)(maxi y'-/ mmj y'^ J 

Pi miuj y'- 



<ev[d,n, ;; y ) +1, 



3(maxj y[f 



thereby proving the Lemma. 



Lemma 5.3.3 (i>) Under the conditions of Lemma 5.3.2, the vector ipi,o{hi), p{hi, g^)) is 
a Gaussian random vector centered at 

y'i - iji 



y'i + m y'i + m 



and has a covariance matrix with eigenvalues between (Opi/Sy^ and {9pi/2y'-) . 
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Proof Because iPi,o{hi),p{hi,gi)) is linear in {hi,gi) and (Jii,gi) is a Gaussian random 

vector, {pifi{hi), p{hi, g^)) is a Gaussian vector. The statement about the center of the 
distributions follows immediately from the fact that (hi,g^) is centered at the origin. To 
construct the covariance matrix, we note that the matrix corresponding to the transforma- 
tion from {hi,gi) to {pifl{hi), p(hi, gi)) is 



-^aj.i 



^"^O-i 2 



0,...,0\ 



y'i+V: 



I 



J 



Thus, the covariance matrix of {'Pifl{hi),p{hi,gj)) is given by p\CjCi. 
We now note that 



Vi + Vi 



\ 



-1 


0,...,0\ 




















/ 







) 





( 



\ 



yj+yi 



ai,i 



y[+m 

ai,2 

y'i+yi 



ai,d 

y'i+yi 



0,...,0\ 







As all the singular values of the middle matrix are 1 , and the norm of the right-hand matrix 
is IK^i, dj)|| /{y[ + yi), all the singular values of Q lie between 



Vi + Vi 



1 - 



Vi + Vi 



and 



+ : 



1 + 



a,; 



Vi + Vi 



The stated bounds now follow from inequality (57) . 



Lemma 5.3.4 (Almost Gaussian) Under the conditions of Lemma 5.3.2, let i^i{pifl,Pi) 
he the induced density on {pifl,Pi), and let hiPi.OiPi) be the induced density on ipi,o,Pi)- 
Then, there exists a set B of ((f 1,0,^1), • • • , {Pn,o,Pn)) such that 

(a) Pr[((pi,o,Pi),...,(p„,o,Pn)) e5] > 1 - 0.0015 (^^^ J" \- and 

(6) for all ((pi,o, Pi), ■ ■ ■ , {Pn,o, pj) ^ 

n n 

Y\MPi,o,Pi) < eY[i'i{pi,o,Pi)- 

1=1 i=i 
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Proof Let 

B = . 



((Pl,o(/il),Pl(/il,fll)), • • • , {Pn,o{hn),Pn{hn,gn))) 

such that (hi, g^) < 3^y{d+ l)lnnpi, for 1 < z < n 



Prom inequaUties (57) and (58), and the assumption hi < {d + 1) Innpi, we can show 

y'i + Vi + hi > 0, and so the map from (/ii,9i), • • • i {hn^Qn) to (pi,o,Pi), • • • , {Pnfi.Pn) is 
invertible for (pi,o, Pi), • ■ ■ , (Pn,o, Pn) ^ Thus, we may apply Corohary 2.4.6 to estabhsh 
part (a). 

Part (h) of follows directly Lemma 5.3.5. ■ 

Lemma 5.3.5 (Almost Gaussian, single variable) Under the conditions of Lemma 5.3.2, 
for all hi and Qi such that {hi, Qi) < 3^ (d + 1) Innpi, 

T^i{Pifl{hi),Pi{hi,gi)) < e^/'^i'i{Pifl{hi),Pi{hi,gi)). 

Proof Let /i(/ii, be the density on {hi, g^). As observed in the proof of Lemma 5.3.4, 
the map from {hi,gi) to {pi,o{hi), Pi{hi, g^)) is injective for {hi,gi) < 3s/ {d + l)lnnpi; 
so, by Proposition 2.5.1, the induced density on Ui is 

1 



l^i{Pi,0,Pi) 

Similarly, 

l>i{Pi,0:Pi) 



det 



( d(pi,o,Pi) \ 
V d{hi,gi) J 



1 



IJ-{hi,gi), where {pi,o,Pi) = {Pi,o{f^i),Pi{hi,gi)). 



I^{hi,gi), where {pi,o,Pi) = {Pi,o{hi),Pi{hi, Sli))- 



det (51^) 

The proof now follows from Lemma 5.3.6, which tells us that 



Khi,gi 

and Lemma 5.3.7, which tells us that 



det 


f d(pi,o,Pi)\ 


det 


\ dihi,gi) J 



< e 



l/lOn 



Lemma 5.3.6 (Almost Gaussian, pointwise) Under the conditions of Lemma 5.3.5, If 



Pifi{hi) =Po{hi), Pi{hi,gi) = Pi{hi,gi), and 

Khi,gi) 
Khi,gi) 



hi,gi 



< 3i^ {d + 1) Innpi, then 
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Proof We first observe that the conditions of the lemma imply 



hi 



hi{y'i + yi) 
y'i + yi + hi 



and 



We then compute 



exp 



2pf 



(hi,9i) 



9i{y'i + lli) 

y'i + yi + hi 



2hi{y'i + yi) + h'i 
{y\ + yi + hi? 



(59) 



Assuming 
most 



{hitOi) ^ 3a/ {d + 1) Inrapi, the absolute value of the exponent in (59) is at 
9(d+l)ln?i / 2hi{y'i + yi) + hf\ 



Prom inequalities (57) and (58), we find 



yi + m 



< 



{y'i + yi + hif 



40 



{y'i + yi + hi? ~ (37)2n(d+ 1)3/2 (In n)3/2pr 
Observing that hi < (l/40)(y^ + yj), we can now lower bound the exponent in (59) by 



9(d + l)lnn 



(81/80)40 



(37)2n(d + 1)3/2 (In n)3/2pi 



< 0.81/n. 



Lemma 5.3.7 (Almost Gaussian, Jacobians) Under the conditions of Lemma 5.3.5, 

^ g.0094/n 



det (^imA 



det 



Proof We first note that 



det 



To compute 



\ d{hi,gi) ) 



d{P0,Pi) 

,d{hi,gi) 
, we note that 

dpifi 



|det {Ci 



{y'i + yir+^' 



dhi 

dPij{hi,gi,k) 



-, and 



{y'i + yi + hiY 

fO ifiT^A; 

I -— P — r- otherwise. 

Vi+yi+hi 
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Thus, the matrix of partial derivatives is lower-triangular, and its determinant has absolute 
value 

'd{pifi,Pi 



Thus, 



det 



dihi,9i 



{y'i + yi + hir+^' 



det 



/ d{pi,o,Pi) 
\ d{hi,gi) 



y'i + yi + hi 
y'i + yi 



1 + 



hi 



d+2 



d+2 



yi + m 



< 1 + 



d+2 



3(d+2)hi 

< e ^< 

< g0.094/n^ 



by (57) 



by d > 3 and (58). 
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6 Discussion and Open Questions 



The results proved in this paper support the assertion that the shadow-vertex simplex 
algorithm usually runs in polynomial time. However, our understanding of the performance 
of the simplex algorithm is far from complete. In this section, we discuss problems in the 
analysis of the simplex algorithm and in the smoothed analysis of algorithms that deserve 
further study. 

6.1 Practicality of the analysis 

While we have demonstrated that the smoothed complexity of the shadow-vertex algorithm 
is polynomial, the polynomial we obtain is quite large. Yet, we believe that the present 
analysis provides some intuition for why the shadow-vertex simplex algorithm should run 
quickly. It is clear that the proofs in this paper are very loose and make many worst-case 
assumptions that are unlikely to be simultaneously valid. We did not make any attempt 
to optimize the coefficients or exponents of the polynomial we obtained. We have not 
attempted such optimization for two reasons: they would increase the length of the paper 
and probably make it more difficult to read; and, we believe that it should be possible to 
improve the bounds in this paper by simplifying the analysis rather than making it more 
complicated. Finally, we point out that most of our intuition comes from the shadow size 
bound, which is not so bad as the bound for the two-phase algorithm. 

6.2 Further analysis of the simplex algorithm 

• While we have analyzed the shadow-vertex pivot rule, there are many other pivot rules 
that are more commonly used in practice. Knowing that one pivot rule usually takes 
polynomial time makes it seem reasonable that others should as well. We consider the 
maximum-increase and steepest-increase rules, as well as randomized pivot rules, to 
be good candidates for smoothed analysis. However, the reader should note that there 
is a reason that the shadow-vertex pivot rule was the first to be analyzed: there is a 
simple geometric description of the vertices encountered by the algorithm. For other 
pivot rules, the only obvious characterization of the vertices encountered is by iterative 
application of the pivot rule. This iterative characterization introduces dependencies 
that make probabilistic analysis difficult. 

• Even if we cannot perform a smoothed analysis of other pivot rules, we might be able 
to measure the diameter of a polytope under smoothed analysis. We conjecture that 
it is expected polynomial in m, d, and 1/cr. 

• Given that the shadow-vertex simplex algorithm can solve the perturbations of linear 
programs efficiently, it seems natural to ask if we can follow the solutions as we 
unperturb the linear programs. For example, having solved an instance of type (4), 
it makes sense to follow the solution as we let a approach zero. Such an approach 
is often called a homotopy or path-following method. So far, we know of no reason 
that there should exist an A for which one cannot follow these solutions in expected 
polynomial time, where the expectation is taken over the choice of G. Of course, if 
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one could follow these solutions in expected polynomial time for every A, then one 
would have a randomized strongly-polynomial time algorithm for linear programming! 

6.3 Degeneracy 

One criticism of our model is that it does not allow for degenerate linear programs. It is 
an interesting problem to find a model of local perturbations that will preserve meaningful 
degeneracies. It seems that one might be able to expand upon the ideas of Todd [Tod91] 
to construct such a model. Until such a model presents itself and is analyzed, we make the 
following observations about types of degeneracies. 

• In primal degeneracy, a single feasible vertex may correspond to multiple bases, I. 
In the polar formulation, this corresponds to an unexpectedly large number of the 
ttiS lying in a - l)-dimensional affinc subspace. In this simplex method 
may cycle — spending many steps switching among bases for this vertex, failing to 
make progress toward the objective function. Unlike many simplex methods, the 
shadow-vertex method may still be seen to be making progress in this situation: each 
successive basis corresponds to a simplex that maps to an edge further along the 
shadow. It just happens that these edges arc co-lincar. 

A more severe version of this phenomenon occurs when the set of feasible points of a 
linear program lies in an affine subspace of fewer then d dimensions. By considering 
perturbations to the constraints under the condition that they do not alter the affine 
span of the set of feasible points, the results on the sizes of shadows obtained in 
Section 4 carry over unchanged. However, how such a restriction would affect the 
results in Section 5 is presently unclear. 

• In dual degeneracy, the optimal solution of the linear program is a face of the polyhe- 
dron rather than a vertex. This does not appear to be a very strong condition, and we 
expect that one could extend our analysis to a model that preserves such degeneracies. 

6.4 Smoothed Analysis 

We believe that many algorithms will be better understood through smoothed analysis. 
Scientists and engineers routinely use algorithms with poor worst-case performance. Often, 
they solve problems that appear intractable from the worst-case perspective. While we do 
not expect smoothed analysis to explain every such instance, we hope that it can explain 
away a significant fragment of the discrepancy between the algorithmic intuitions of engi- 
neers and theorists. To make it easier to apply smoothed analyses, we briefly discuss some 
alternative definitions of smoothed analysis. 

Zero-preserving perturbations: One criticism of smoothed complexity as defined 
in Section 1.2 is that the additive Gaussian perturbations destroy any zero-structure that 
the problem has, as it will replace the zeros with small values. One can refine the model 
to fix this problem by studying zero-preserving perturbations. In this model, one applies 
Gaussian perturbations only to non-zero entries. Zero entries remain zero. 

Relative perturbations: A further refinement is the model of relative perturbations. 
Under a relative perturbation, an input is mapped to a constant multiple of itself. For 
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example, a reasonable definition would be to map each variable by 

X H- > x{l + ag), 

where ^ is a Gaussian random variable of mean zero and variance 1. Thus, each number 
is usually mapped to one of similar magnitude, and zero is always mapped to zero. When 
we measure smoothed complexity under relative perturbations, we call it relative smoothed 
complexity. Smooth complexity as defined in Section 1.2 above can be called absolute 
smoothed complexity if clarification is necessary. It would be very interesting to know if the 
simplex method has polynomial relative smoothed complexity. 

e-smoothed-complexity: Even if we cannot bound the expectation of the running 
time of an algorithm under perturbations, we can still obtain computationally meaningful 
results for an algorithm by proving that it has e-smoothed-complexity /(n, a, e), by which 
we mean that the probability that it takes time more than /(n, cr, e) is at most e:n 

V^gx„ Pr [C{A, x + a ms^{x)g) < f{n, a)] > 1 - e. 
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